6.3.0 release notes (#199)

* generate 6.3.0 RELEASE.md

* add 6.3.0 os/hw support

* regenerate changelog

* update table

* add amd smi and fix fmt

* add rocjpeg note

* add missed changelog entries

* update ga date

* add SHARK toolkit introduced note

update SHARK note

* Edited some components (#202)

* Edited some components

* fixed formatting on rocal

* markdown fail on the last commit; fixed

* capitalization fix

* Copy edit component change logs (#203)

* fix some formatting

* fix table and add OpenCL note

fix fmt

fix more formatting

* add radeon note

* add rocmsmi

* Updated hipCUB, rocPrim, and rocThrust (#206)

* fix some stuff

* add transferbench

* Edits to RCCL 6.3 change log (#207)

* Update tools/autotag/templates/upcoming_changes/6.3.0.md

* fix formatting

* fix sphinx underline warning

* add @lpaoletti's highlights

* fix os support

* add missing kernel version

* fix heading

* add bitsandbytes ki

* Copy edits to release notes (#208)

* Copy edits to release notes

* Additional updates to release notes

* updated shark AI toolkit description

* fix formatting

* update opencl

* update opencl

fixes and updates

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* fix omnitools rename text

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* update omniperf and tesile notes

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* made some copy edits (#209)

* Apply suggestions from code review

* Update RELEASE.md

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* indent

* add more highlights

* update shark urls

* add omni notes

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* update some changelogs

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* update some cls

* and missed changelogs

* add missed component updates

* fix links

* add amdgpu-dkms highlight

* Update RELEASE.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* change links

* add fixed issues

* @neon60's changes

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

* rm extra hip docs

* add hip links

* add fixed issue

fix

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* fix ri

* fix zebra

* Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* rm extra amd smi info

* Apply suggestions from code review

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* add more about omni renmae

fix rename stuff

* Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* fix formatting

* wording

* fix link

* update aotriton

* remove libraries performance improved

* fix rhel version

* fix urls

shorten title

* Apply suggestions from code review

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

* Release notes updates (#212)

* Made language more precise (#211)

MIVisionX and rocAL were changed. An awkward sentence in rocAL was also fixed.

* add rocprofiler

* add rdc

add rdc entry

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Update RELEASE.md

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

* remove bitsandbytes known issue

* fix missed hip doc

* update rocprof-compute version to 3.0.0

* remove words

* change hiprand ver to 2.11.0

* update new components descriptions

* add #

* fix tensile versions

* fix versions and add missed cls

* Update RELEASE.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* remove resolved issue for #3493

* add rdc note

* add hiprand known issue

add hiprand known issue

add asterisk for hiprand ki

asterisk formatting

asterisk

link asterisk

* rdc known issue

* @lpaoletti updates

* @wenchenvincent add CK to Transformer Engine note

* fix links

fix links

* add roct thunk interface note

* rm 'previously'

* Apply suggestions from code review

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* add known issues

* add mi300x cpfw known issue

* add mi300x cpfw known issue

add note

* spacing

* update te error KI

* rm incorrect user impact in TE known issue

* correct description of transformer engine fatal python error known issue

* update autotag/templates

* fix order

* fix typo

* update .wordlist.txt w/ lib names

* add missing css classes

* remove ROCT-Thunk-Interface from ROCm licenses

* add rocJPEG LICENSE

* fix table zebra b/c added rows

* fix capitalization in toc

* update URLs post-review

* update AMD SMI changelog

* update ROCm SMI changelog

* add opencl icd stale file kI

words

* remove Azure Linux

* update omnitrace note

* add mi200 DLM known issue

* update omnitrace note

update omnitrace note

wording

update omnitrace note

* update 6.3 ga to 11/26

* update KIs wording

* Update tools/autotag/templates/highlights/6.3.0.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Update tools/autotag/templates/highlights/6.3.0.md

Co-authored-by: Istvan Kiss <neon60@gmail.com>

* update TransferBench note

* remove transferbench

remove transferbench

* remove gfx12, 1151

* remove sr-iov

* rm tb

* css classes

* rm gfx12

* add back transferbench

* add transferbench to table

* rm transferbench, add as KI

* update transferbench KI workaround

* add rocprof-comp KI

fix

* fix tensile

* add backward weights conv KI

update

* remove RHEL 8.9 from OS EOS

* remove mi200 perf drop for DLMs

* add RHEL 8.9 to end of support OSes

* add omniperf/omnitrace KIs

* remove bf16 statement in mi300x KI

* update rvs versions in compat

* add amd smi KI

update

update

* words

* update GA date for 6.3.0

* add rvs KI

* add KI links

same

* rvs in compat

* update tf versions

* add rvs changelog

* update rn templates

* add possessives to wordlist

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Istvan Kiss <neon60@gmail.com>
Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
This commit is contained in:
Peter Park
2024-12-03 15:16:38 -05:00
committed by GitHub
parent f53faa19ea
commit 059c2cd9a4
15 changed files with 1863 additions and 252 deletions

View File

@@ -13,6 +13,7 @@ AMDMIGraphX
AMI
AOCC
AOMP
AOTriton
APBDIS
APIC
APIs
@@ -158,6 +159,7 @@ HWS
Haswell
Higgs
Hyperparameters
ICD
ICV
IDE
IDEs
@@ -208,6 +210,7 @@ MiB
MIGraphX
MIOpen
MIOpenGEMM
MIOpen's
MIVisionX
MLM
MMA
@@ -295,7 +298,9 @@ PipelineParallel
PnP
PowerEdge
PowerShell
Profiler's
PyPi
Pytest
PyTorch
Qcycles
Qwen
@@ -303,6 +308,7 @@ RAII
RAS
RCCL
RDC
RDC's
RDMA
RDNA
README
@@ -342,6 +348,7 @@ SENDMSG
SGPR
SGPRs
SHA
SHARK's
SIGQUIT
SIMD
SIMDs
@@ -521,6 +528,7 @@ devsel
dimensionality
disambiguates
distro
dkms
el
embeddings
enablement
@@ -686,6 +694,7 @@ rocALUTION
rocBLAS
rocDecode
rocFFT
rocJPEG
rocLIB
rocMLIR
rocPRIM
@@ -778,6 +787,7 @@ vectorize
vectorized
vectorizer
vectorizes
virtualized
vjxb
voxel
walkthrough

1745
RELEASE.md

File diff suppressed because it is too large Load Diff

View File

@@ -59,6 +59,7 @@ additional licenses. Please review individual repositories for more information.
| [rocDecode](https://github.com/ROCm/rocDecode) | [MIT](https://github.com/ROCm/rocDecode/blob/develop/LICENSE) |
| [rocFFT](https://github.com/ROCm/rocFFT/) | [MIT](https://github.com/ROCm/rocFFT/blob/develop/LICENSE.md) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [rocJPEG](https://github.com/ROCm/rocJPEG/) | [MIT](https://github.com/ROCm/rocJPEG/blob/develop/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/amd-staging/License.txt) |
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
@@ -84,7 +85,6 @@ additional licenses. Please review individual repositories for more information.
| [rocSPARSE](https://github.com/ROCm/rocSPARSE/) | [MIT](https://github.com/ROCm/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCm/rocThrust/) | [Apache 2.0](https://github.com/ROCm/rocThrust/blob/develop/LICENSE) |
| [ROCTracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [rocWMMA](https://github.com/ROCm/rocWMMA/) | [MIT](https://github.com/ROCm/rocWMMA/blob/develop/LICENSE.md) |
| [Tensile](https://github.com/ROCm/Tensile/) | [MIT](https://github.com/ROCm/Tensile/blob/develop/LICENSE.md) |
| [TransferBench](https://github.com/ROCm/TransferBench) | [MIT](https://github.com/ROCm/TransferBench/blob/develop/LICENSE.md) |

View File

@@ -22,7 +22,7 @@ ROCm Version,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,
:doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
,,,,,,,,,,
@@ -86,7 +86,7 @@ ROCm Version,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.3.0,rocm-6.2.4,rocm-6.2.2,rocm-6.2.1,rocm-6.2.0,rocm-6.1.2,rocm-6.1.1,rocm-6.1.0,rocm-6.0.2,rocm-6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
,,,,,,,,,,
PERFORMANCE TOOLS,,,,,,,,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
1 ROCm Version 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
22
23 FRAMEWORK SUPPORT .. _framework-support-compatibility-matrix-past-60:
24 :doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>` 2.4, 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13
25 :doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>` 2.16.1, 2.15.1, 2.14.1 2.17.0, 2.16.2, 2.15.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.14.0, 2.13.1, 2.12.1 2.14.0, 2.13.1, 2.12.1
26 :doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>` 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26
27 `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_ 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.14.1 1.14.1
28
86 :doc:`ROCm Data Center Tool <rdc:index>` 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0
87 :doc:`rocminfo <rocminfo:index>` 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
88 :doc:`ROCm SMI <rocm_smi_lib:index>` 7.4.0 7.3.0 7.3.0 7.3.0 7.3.0 7.2.0 7.0.0 7.0.0 6.0.2 6.0.0
89 :doc:`ROCm Validation Suite <rocmvalidationsuite:index>` rocm-6.3.0 1.1.0 rocm-6.2.4 1.0.60204 rocm-6.2.2 1.0.60202 rocm-6.2.1 1.0.60201 rocm-6.2.0 1.0.60200 rocm-6.1.2 1.0.60102 rocm-6.1.1 1.0.60101 rocm-6.1.0 1.0.60100 rocm-6.0.2 1.0.60002 rocm-6.0.0 1.0.60000
90
91 PERFORMANCE TOOLS
92 :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>` 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0

View File

@@ -49,7 +49,7 @@ compatibility and system requirements.
,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3
,,,
@@ -113,7 +113,7 @@ compatibility and system requirements.
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.4.0,7.3.0,7.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.3.0,rocm-6.2.4,rocm-6.1.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.0.60204,1.0.60100
,,,
PERFORMANCE TOOLS,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0

View File

@@ -30,15 +30,15 @@ if os.environ.get("READTHEDOCS", "") == "True":
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.2.4"
release = "6.2.4"
version = "6.3.0"
release = "6.3.0"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "os": ["linux", "windows"], "date": "2024-11-06"},
{"file": "about/release-notes", "os": ["linux", "windows"], "date": "2024-12-03"},
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/install", "os": ["linux"]},

View File

@@ -479,7 +479,7 @@ Change affinity of ROCm helper threads
This change prevents internal ROCm threads from having their CPU core affinity mask
set to all CPU cores available. With this setting, the threads inherit their parent's
CPU core affinity mask. If you have any questions regarding this setting,
contact your MI300A platform vendor. To enable this setting, enter the following command:
contact your MI300X platform vendor. To enable this setting, enter the following command:
.. code-block:: shell

View File

@@ -272,7 +272,7 @@ ability to collect timeline traces of the accelerator software stack as well as
.. _mi300x-rocprof-compute:
ROCm Compute Profiler
^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>` is a system performance profiler for high-performance computing (HPC) and
machine learning (ML) workloads using Instinct accelerators. Under the hood, ROCm Compute Profiler uses
@@ -301,7 +301,7 @@ a web-based GUI or command-line analyzer, depending on your preference.
.. _mi300x-rocprof-systems:
ROCm Systems Profiler
^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>` is a comprehensive profiling and tracing tool for parallel applications,
including HPC and ML packages, written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU and

View File

@@ -8,6 +8,7 @@
| Version | Release date |
| ------- | ------------ |
| [6.3.0](https://rocm.docs.amd.com/en/docs-6.3.0/) | December 3, 2024 |
| [6.2.4](https://rocm.docs.amd.com/en/docs-6.2.4/) | November 6, 2024 |
| [6.2.2](https://rocm.docs.amd.com/en/docs-6.2.2/) | September 27, 2024 |
| [6.2.1](https://rocm.docs.amd.com/en/docs-6.2.1/) | September 20, 2024 |

View File

@@ -69,7 +69,7 @@ subtrees:
- file: how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.rst
title: Optimize Triton kernels
- file: how-to/llm-fine-tuning-optimization/profiling-and-debugging.rst
title: Profile and Debug
title: Profile and debug
- file: how-to/system-optimization/index.rst
title: System optimization
subtrees:

View File

@@ -0,0 +1,164 @@
# ROCm 6.3.0 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
- [Release highlights](#release-highlights)
- [Operating system and hardware support changes](#operating-system-and-hardware-support-changes)
- [ROCm components versioning](#rocm-components)
- [Detailed component changes](#detailed-component-changes)
- [ROCm known issues](#rocm-known-issues)
- [ROCm resolved issues](#rocm-resolved-issues)
- [ROCm upcoming changes](#rocm-upcoming-changes)
```{note}
If youre using Radeon™ PRO or Radeon GPUs in a workstation setting with a
display connected, continue to use ROCm 6.2.3. See the [Use ROCm on Radeon
GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/index.html)
documentation to verify compatibility and system requirements.
```
## Release highlights
The following are notable new features and improvements in ROCm 6.3.0. For changes to individual components, see
[Detailed component changes](#detailed-component-changes).
### rocJPEG added
ROCm 6.3.0 introduces the rocJPEG library to the ROCm software stack. rocJPEG is a high performance
JPEG decode SDK for AMD GPUs. For more information, see the [rocJPEG
documentation](https://rocm.docs.amd.com/projects/rocJPEG/en/docs-6.3.0/index.html).
### ROCm Compute Profiler and ROCm Systems Profiler
These ROCm components have been renamed to reflect their new direction as part of the ROCm software
stack.
- **ROCm Compute Profiler**, formerly Omniperf. For more information, see the [ROCm Compute Profiler
documentation](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.3.0/index.html) and
[https://github.com/ROCm/rocprofiler-compute](https://github.com/ROCm/rocprofiler-compute) on GitHub.
- **ROCm Systems Profiler**, formerly Omnitrace. For more information, see the [ROCm Systems Profiler
documentation](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.3.0/index.html) and
[https://github.com/ROCm/rocprofiler-systems](https://github.com/ROCm/rocprofiler-systems) on GitHub.
For future compatibility, the Omnitrace project is available at [https://github.com/ROCm/omnitrace](https://github.com/ROCm/omnitrace).
See the [Omnitrace documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html).
```{note}
Update any references to the old binary names `omniperf` and `omnitrace` to
ensure compatibility with the new `rocprof-compute` and `rocprof-sys-*` binaries.
This might include updating environment variables, commands, and paths as
needed to avoid disruptions to your profiling or tracing workflows.
See [ROCm Compute Profiler](#rocm-compute-profiler-3-0-0) and [ROCm Systems
Profiler](#rocm-systems-profiler-0-1-0).
```
### SHARK AI toolkit for high-speed inferencing and serving introduced
SHARK is an open-source toolkit for high-performance serving of popular generative AI and large
language models. In its initial release, SHARK contains the [Shortfin high-performance serving
engine](https://github.com/nod-ai/shark-ai/tree/main/shortfin), which is the SHARK inferencing
library that includes example server applications for popular models.
This initial release includes support for serving the Stable Diffusion XL model on AMD Instinct™
MI300 devices using ROCm. See SHARK's [release
page](https://github.com/nod-ai/shark-ai/releases/tag/v3.0.0) on GitHub to get started.
### PyTorch 2.4 support added
ROCm 6.3.0 adds support for PyTorch 2.4. See the [Compatibility
matrix](https://rocm.docs.amd.com/en/docs-6.3.0/compatibility/compatibility-matrix.html#framework-support-compatibility-matrix)
for the complete list of PyTorch versions tested for compatibility with ROCm.
### Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine
Composable Kernel-based and Triton-based Flash Attention kernels have been integrated into
Transformer Engine via the ROCm Composable Kernel and AOTriton libraries. The
Transformer Engine can now optionally select a flexible and optimized Attention
solution for AMD GPUs. For more information, see [Fused Attention Backends on
ROCm](https://github.com/ROCm/TransformerEngine/tree/dev?tab=readme-ov-file#fused-attention-backends-on-rocm)
on GitHub.
### HIP compatibility
HIP now includes the `hipStreamLegacy` API. It's equivalent to NVIDIA `cudaStreamLegacy`. For more
information, see [Global enum and
defines](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hip_runtime_api/global_defines_enums_structs_files/global_enum_and_defines.html#c.hipStreamLegacy)
in the HIP runtime API documentation.
### Unload active amdgpu-dkms module without a system reboot
On Instinct MI200 and MI300 systems, you can now unload the active `amdgpu-dkms` modules, and reinstall
and reload newer modules without a system reboot. If the new `dkms` package includes newer firmware
components, the driver will first reset the device and then load newer firmware components.
### ROCm Offline Installer Creator updates
The ROCm Offline Installer Creator 6.3 introduces a new feature to uninstall the previous version of
ROCm on the non-connected target system before installing a new version. This feature is only supported
on the Ubuntu distribution. See the [ROCm Offline Installer
Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/rocm-offline-installer.html)
documentation for more information.
### OpenCL ICD loader separated from ROCm
The OpenCL ICD loader is no longer delivered as part of ROCm, and must be installed separately
as part of the [ROCm installation
process](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0). For Ubuntu and RHEL
installations, the required package is installed as part of the setup described in
[Prerequisites](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/prerequisites.html).
In other supported Linux distributions like SUSE, the required package must be installed in separate steps, which are included in the installation instructions.
Because the OpenCL path is now separate from the ROCm installation for versioned and multi-version
installations, you must manually define the `LD_LIBRARY_PATH` to point to the ROCm
installation library as described in the [Post-installation
instructions](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/post-install.html).
If the `LD_LIBRARY_PATH` is not set as needed for versioned or multi-version installations, OpenCL
applications like `clinfo` will fail to run and return an error.
### ROCT Thunk Interface integrated into ROCr runtime
The ROCT Thunk Interface package is now integrated into the ROCr runtime. As a result, the ROCT package
is no longer included as a separate package in the ROCm software stack.
### ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a
wider variety of user needs and use cases.
- Documentation for Tensile is now available. Tensile is a library that creates
benchmark-driven backend implementations for GEMMs, serving primarily as a
backend component of rocBLAS. See the [Tensile
documentation](https://rocm.docs.amd.com/projects/Tensile/en/docs-6.3.0/src/index.html).
- New documentation has been added to explain the advantages of enabling the IOMMU in passthrough
mode for Instinct accelerators and Radeon GPUs. See [Input-Output Memory Management
Unit](https://rocm.docs.amd.com/en/docs-6.3.0/conceptual/iommu.html).
- The HIP documentation has been updated and includes the following new topics:
- [What is HIP?](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/what_is_hip.html)
- [HIP environment variables](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/env_variables.html)
- [Initialization](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/initialization.html)
and [error handling](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/error_handling.html)
- [Hardware features](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hardware_features.html)
- [Call stack](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/call_stack.html)
- [External resource interoperability](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/external_interop.html)
- The following HIP documentation topics have been updated:
- [HIP FAQ](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/faq.html)
- [Deprecated APIs](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/deprecated_api_list.html)
- [Performance guidelines](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/performance_guidelines.html)
- The following HIP documentation topics have been reorganized to improve usability:
- [HIP documentation landing page](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/index.html)
- [HIP runtime API reference topics](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hip_runtime_api_reference.html)
- [Programming guide](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api.html)

View File

@@ -0,0 +1,122 @@
## ROCm known issues
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
issues related to individual components, review the [Detailed component changes](#detailed-component-changes).
### Instinct MI300X reports incorrect raw GPU timestamps
On MI300X accelerators, the command processor firmware reports incorrect raw GPU timestamps. This
issue is under investigation and will be addressed in a future release.
### Instinct MI300 series: backward weights convolution performance issue
A performance issue affects certain tensor shapes during backward weights convolution when using
FP16 or FP32 data types on Instinct MI300 series accelerators. This issue will be addressed in a future ROCm release.
To mitigate the issue during model training, set the following environment variables:
```bash
export MIOPEN_FIND_MODE=3
export MIOPEN_FIND_ENFORCE=3
```
These settings enable auto-tuning on the first occurrence of a new tensor shape. The tuning results
are stored in the user database, eliminating the need for repeated tuning when the same shape is
encountered in subsequent runs. See the
[MIOpen](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#miopen)
section in the workload optimization guide to learn more about MIOpen's auto-tuning capabilities.
### TransferBench package not functional
TransferBench packages included in the ROCm 6.3.0 release are not compiled properly and are not
functional for most GPU targets, with the exception of gfx906. Full functionality will be available
in a future ROCm release.
TransferBench is a utility for benchmarking simultaneous transfers between user-specified devices
(CPUs or GPUs). See the documentation at [TransferBench
documentation](https://rocm.docs.amd.com/projects/TransferBench/en/docs-6.3.0/index.html). Those
looking to use TransferBench can access the properly compiled packages at
[https://github.com/ROCm/TransferBench/releases](https://github.com/ROCm/TransferBench/releases).
### ROCm Compute Profiler post-upgrade
In ROCm 6.3.0, the `omniperf` package is now named `rocprofiler-compute`. As a result, running `apt install omniperf` will fail to locate the package.
Instead, use `apt install rocprofiler-compute`. See [ROCm Compute Profiler 3.0.0](#rocm-compute-profiler-3-0-0).
When upgrading from ROCm 6.2 to 6.3, any existing `/opt/rocm-6.2/../omniperf` folders are not
automatically removed. To clean up these folders, manually uninstall Omniperf using `apt remove omniperf`.
### ROCm Systems Profiler post-upgrade
In ROCm 6.3.0, the `omnitrace` package is now named `rocprofiler-systems`. As a result, running `apt install omnitrace` will fail to locate the package.
Instead, use `apt install rocprofiler-systems`. See [ROCm Systems Profiler 0.1.0](#rocm-systems-profiler-0-1-0).
When upgrading from ROCm 6.2 to 6.3, any existing `/opt/rocm-6.2/../omnitrace` folders are not
automatically removed. To clean up these folders, manually uninstall Omnitrace using `apt remove omnitrace`.
### Stale file due to OpenCL ICD loader deprecation
When upgrading from ROCm 6.2.x to ROCm 6.3.0, the [removal of the `rocm-icd-loader`
package](#opencl-icd-loader-separated-from-rocm) leaves a stale file in the old `rocm-6.2.x`
directory. This has no functional impact. As a workaround, manually uninstall the
`rocm-icd-loader` package to remove the stale file. This issue will be addressed in a future ROCm
release.
### ROCm Compute Profiler CTest failure in CI
When running ROCm Compute Profiler's (`rocprof-compute`) CTest in the Azure CI environment, the
`rocprof-compute` execution test fails. This issue is due to an outdated test file that was not renamed
(`omniperf` to `rocprof-compute`), and due to the `ROCM_PATH` environment variable not being set in
the Azure CI environment, causing the tool to be unable to extract chip information as expected.
This issue will be addressed in a future ROCm release.
### MIVisionX memory access fault in Canny edge detection
Canny edge detection kernels might access out-of-bounds memory locations while
computing gradient intensities on edge pixels. This issue is isolated to
Canny-specific use cases on Instinct MI300 series accelerators. This issue is
resolved in the [MIVisionX `develop` branch](https://github.com/ROCm/mivisionx)
and will be part of a future ROCm release.
### Transformer Engine test_distributed_fused_attn aborts with fatal Python error
The `test_distributed_fused_attn` Pytest case for JAX in [Transformer Engine
for ROCm](https://github.com/ROCm/TransformerEngine) fails with a fatal Python
error under certain conditions. The root cause is unrelated Transformer Engine
but due to some issue within XLA. This XLA issue is under investigation and
will be addressed in a future release.
### AMD SMI manual build issue
Manual builds of AMD SMI fail due to a broken link in its build configuration.
This affects past AMD SMI releases as well. The fix is underway and will be
applied to all branches at [https://github.com/ROCm/amdsmi](https://github.com/ROCm/amdsmi).
### ROCm Data Center Tool incorrect RHEL9 package version
In previous versions of ROCm Data Center Tool (RDC) included with ROCm 6.2 for RHEL9, RDC's version
number was incorrectly set to `1.0.0`. ROCm 6.3 includes RDC with the correct version number.
```{important}
If you're using RHEL9, you must first uninstall the existing ROCm 6.2 RDC 1.0.0 package with `sudo yum
remove rdc` before upgrading to the ROCm 6.3 RDC package `sudo yum install rdc`.
```
### ROCm Validation Suite needs specified configuration file
ROCm Validation Suite might fail for certain platforms if executed without the `-c` option and
specifying the configuration file. See [RVS command line
options](https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/docs-6.3.0/ug1main.html#command-line-options)
for more information. This issue will be addressed in a future release.
## ROCm resolved issues
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the [Detailed component changes](#detailed-component-changes).
### Bandwidth limitation in gang and non-gang modes on Instinct MI300A
Fixed an issue where expected target peak non-gang performance (~60 GB/s) and target peak gang
performance (~90 GB/s) were not achieved. Previously, both gang and non-gang performance were
observed to be limited at 45 GB/s. See [issue #3496](https://github.com/ROCm/ROCm/issues/3496) on
GitHub.

View File

@@ -0,0 +1,11 @@
## ROCm resolved issues
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the [Detailed component changes](#detailed-component-changes).
### Bandwidth limitation in gang and non-gang modes on Instinct MI300A
Fixed an issue where expected target peak non-gang performance (~60 GB/s) and target peak gang
performance (~90 GB/s) were not achieved. Previously, both gang and non-gang performance were
observed to be limited at 45 GB/s. See [issue #3496](https://github.com/ROCm/ROCm/issues/3496) on
GitHub.

View File

@@ -0,0 +1,25 @@
## Operating system and hardware support changes
ROCm 6.3.0 adds support for the following operating system and kernel versions:
- Ubuntu 24.04.2 (kernel: 6.8 [GA], 6.11 [HWE])
- Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE])
- RHEL 9.5 (kernel: 5.14.0)
- Oracle Linux 8.10 (kernel: 5.15.0)
See installation instructions at [ROCm installation for
Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/).
ROCm 6.3.0 marks the end of support (EoS) for:
- Ubuntu 24.04.1
- Ubuntu 22.04.4
- RHEL 9.3
- RHEL 8.9
- Oracle Linux 8.9
Hardware support remains unchanged in this release.
See the [Compatibility
matrix](https://rocm.docs.amd.com/en/docs-6.3.0/compatibility/compatibility-matrix.html)
for more information about operating system and hardware compatibility.

View File

@@ -0,0 +1,13 @@
## ROCm upcoming changes
The following changes to the ROCm software stack are anticipated for future releases.
### AMDGPU wavefront size compiler macro deprecation
The `__AMDGCN_WAVEFRONT_SIZE__` macro will be deprecated in an upcoming
release. It is recommended to remove any use of this macro. For more information, see [AMDGPU
support](https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/clang/html/AMDGPUSupport.html).
### HIPCC Perl scripts deprecation
The HIPCC Perl scripts (`hipcc.pl` and `hipconfig.pl`) will be removed in an upcoming release.