update release note files (#2617)

--------- Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2026-02-04 03:15:28 -05:00 · 2023-11-10 15:14:59 -07:00
parent 3f855e386c
commit 37c48060f7
27 changed files with 1699 additions and 995 deletions
--- a/docs/about/compatibility/docker-image-support-matrix.rst
+++ b/docs/about/compatibility/docker-image-support-matrix.rst
@@ -3,7 +3,7 @@ Docker image support matrix
 ******************************************************************

 AMD validates and publishes `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_ and
-`TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_ containers on dockerhub. The following
+`TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_ containers on Docker Hub. The following
 tags, and associated inventories, are validated with ROCm 5.7.

 .. tab-set::
--- a/docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst
+++ b/docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst
@@ -1,36 +1,58 @@
-===========================
+*****************************************************************************
 How ROCm uses PCIe atomics
-===========================
-
+*****************************************************************************

 ROCm PCIe feature and overview of BAR memory
-======================================================================
+================================================================

+ROCm is an extension of HSA platform architecture, so it shares the queuing model, memory model,
+signaling and synchronization protocols. Platform atomics are integral to perform queuing and
+signaling memory operations where there may be multiple-writers across CPU and GPU agents.

-ROCm is an extension of HSA platform architecture, so it shares the queueing model, memory model, signaling and synchronization protocols. Platform atomics are integral to perform queuing and signaling memory operations where there may be multiple-writers across CPU and GPU agents.
+The full list of HSA system architecture platform requirements are here:
+`HSA Sys Arch Features <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf>`_.

-The full list of HSA system architecture platform requirements are here: `HSA Sys Arch Features <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf>`_.
+The ROCm platform uses the new PCI Express 3.0 (Peripheral Component Interconnect Express [PCIe]
+3.0) features for atomic read-modify-write transactions which extends inter-processor synchronization
+mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling
+memory operations.

-The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
-
-The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
-I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
+The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``,
+``SWAP`` atomics. The atomic operations are initiated by the I/O device which support 32-bit, 64-bit and
+128-bit operand which target address have to be naturally aligned to operation sizes.

 For ROCm the Platform atomics are used in ROCm in the following ways:

-   * Update HSA queue’s read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it 	  processed.
-   * Update HSA queue’s write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
-   * Update HSA Signals – 64bit atomic ops are used for CPU & GPU synchronization.
+  * Update HSA queue's read_dispatch_id: 64 bit atomic add used by the command processor on the
+    GPU agent to update the packet ID it processed.
+  * Update HSA queue's write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to
+    support multi-writer queue insertions.
+  * Update HSA Signals -- 64bit atomic ops are used for CPU & GPU synchronization.

-The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O devices has the capability to Atomics Operations.
+The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through
+and completed by PCIe components. Routing and completion does not require software support.
+Component support for each is detectable via the DevCap2 register. Upstream bridges need to have
+atomic operations routing enabled or the atomic operations will fail even though PCIe endpoint and
+PCIe I/O devices has the capability to atomic operations.

-To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp routing supported bit in the Device Capabilities 2 register.
+To do atomic operations routing capability between two or more Root Ports, each associated Root Port
+must indicate that capability via the atomic operations routing supported bit in the Device Capabilities
+2 register.

-If your system has a PCIe Express Switch it needs to support AtomicsOp routing. AtomicOp requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
+If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic
+operations requests are permitted only if a component's ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE``
+field is set. These requests can only be serviced if the upstream components support atomic operation
+completion and/or routing to a component which does. Atomic operations routing support=1, routing
+is supported; atomic operations routing support=0, routing is not supported.

-An atomic operation is a non-posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
+An atomic operation is a non-posted transaction supporting 32-bit and 64-bit address formats, there
+must be a response for Completion containing the result of the operation. Errors associated with the
+operation (uncorrectable error accessing the target location or carrying out the atomic operation) are
+signaled to the requester by setting the Completion Status field in the completion descriptor, they are
+set to to Completer Abort (CA) or Unsupported Request (UR).

-To understand more about how PCIe atomic operations work, see `PCIe atomics <https://pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf>`_
+To understand more about how PCIe atomic operations work, see
+`PCIe atomics <https://pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf>`_

 `Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/project/linux-pci/patch/1443110390-4080-1-git-send-email-jay@jcornwall.me/>`_

@@ -39,56 +61,60 @@ There are also a number of papers which talk about these new capabilities:
  * `Atomic Read Modify Write Primitives by Intel <https://www.intel.es/content/dam/doc/white-paper/atomic-read-modify-write-primitives-i-o-devices-paper.pdf>`_
  * `PCI express 3 Accelerator White paper by Intel <https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf>`_
  * `Intel PCIe Generation 3 Hotchips Paper <https://www.hotchips.org/wp-content/uploads/hc_archives/hc21/1_sun/HC21.23.1.SystemInterconnectTutorial-Epub/HC21.23.131.Ajanovic-Intel-PCIeGen3.pdf>`_
-  * `PCIe Generation 4 Base Specification includes Atomics Operation <https://astralvx.com/storage/2020/11/PCI_Express_Base_4.0_Rev0.3_February19-2014.pdf>`_
+  * `PCIe Generation 4 Base Specification includes atomic operations <https://astralvx.com/storage/2020/11/PCI_Express_Base_4.0_Rev0.3_February19-2014.pdf>`_

 Other I/O devices with PCIe atomics support

-   * `Mellanox ConnectX-5 InfiniBand Card <http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card.pdf>`_
-   * `Cray Aries Interconnect <http://www.hoti.org/hoti20/slides/Bob_Alverson.pdf>`_
-   * `Xilinx PCIe Ultrascale White paper <https://docs.xilinx.com/v/u/8OZSA2V1b1LLU2rRCDVGQw>`_
-   * `Xilinx 7 Series Devices <https://docs.xilinx.com/v/u/1nfXeFNnGpA0ywyykvWHWQ>`_
+  * `Mellanox ConnectX-5 InfiniBand Card <http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card.pdf>`_
+  * `Cray Aries Interconnect <http://www.hoti.org/hoti20/slides/Bob_Alverson.pdf>`_
+  * `Xilinx PCIe Ultrascale White paper <https://docs.xilinx.com/v/u/8OZSA2V1b1LLU2rRCDVGQw>`_
+  * `Xilinx 7 Series Devices <https://docs.xilinx.com/v/u/1nfXeFNnGpA0ywyykvWHWQ>`_

 Future bus technology with richer I/O atomics operation Support

  * GenZ

-New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPU’s with PCIe Generation 3.0 support.
+New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPUs
+with PCIe Generation 3.0 support.

  * `Mellanox Bluefield SOC <https://docs.nvidia.com/networking/display/BlueFieldSWv25111213/BlueField+Software+Overview>`_
  * `Cavium Thunder X2 <https://en.wikichip.org/wiki/cavium/thunderx2>`_

-In ROCm, we also take advantage of PCIe ID based ordering technology for P2P when the GPU originates two writes to two different targets:
+In ROCm, we also take advantage of PCIe ID based ordering technology for P2P when the GPU
+originates two writes to two different targets:

-  | 1. write to another GPU memory,
+# Write to another GPU memory
+# Write to system memory to indicate transfer complete

-  | 2. then write to system memory to indicate transfer complete.
-
-They are routed off to different ends of the computer but we want to make sure the write to system memory to indicate transfer complete occurs AFTER P2P write to GPU has complete.
+They are routed off to different ends of the computer but we want to make sure the write to system
+memory to indicate transfer complete occurs AFTER P2P write to GPU has complete.

 BAR memory overview
-***************************************************************************************************
-On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
+----------------------------------------------------------------------------------------------------
+On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set
+memory-mapped input/output (MMIO) base address (MMIOH base) and range (MMIO high size) in the BIOS.

-In SuperMicro system in the system bios you need to see the following
+In the Supermicro system in the system bios you need to see the following

-   * Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
+  * Advanced->PCIe/PCI/PnP configuration-\> Above 4G Decoding = Enabled
+  * Advanced->PCIe/PCI/PnP Configuration-\>MMIOH Base = 512G
+  * Advanced->PCIe/PCI/PnP Configuration-\>MMIO High Size = 256G

-   * Advanced->PCIe/PCI/PnP Configuration->MMIOH Base = 512G
-
-   * Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
-
-When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
+When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.

 For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.

-   * BAR0-1 registers: 64bit, prefetchable, GPU memory. 8GB or 16GB depending on Vega10 SKU. Must be placed < 2^44 to support P2P  	access from other Vega10.
-   * BAR2-3 registers: 64bit, prefetchable, Doorbell. Must be placed < 2^44 to support P2P access from other Vega10.
-   * BAR4 register: Optional, not a boot device.
-   * BAR5 register: 32bit, non-prefetchable, MMIO. Must be placed < 4GB.
+  * BAR0-1 registers: 64bit, prefetchable, GPU memory. 8GB or 16GB depending on Vega10 SKU. Must
+    be placed < 2^44 to support P2P  	access from other Vega10.
+  * BAR2-3 registers: 64bit, prefetchable, Doorbell. Must be placed \< 2^44 to support P2P access from
+    other Vega10.
+  * BAR4 register: Optional, not a boot device.
+  * BAR5 register: 32bit, non-prefetchable, MMIO. Must be placed \< 4GB.

-Here is how our base address register (BAR) works on GFX 8 GPU’s with 40 bit Physical Address Limit ::
+Here is how our base address register (BAR) works on GFX 8 GPUs with 40 bit Physical Address Limit ::

-  11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev c1)
+  11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO
+  Series] (rev c1)

  Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b35

@@ -106,40 +132,23 @@ Here is how our base address register (BAR) works on GFX 8 GPU’s with 40 bit P

 Legend:

-1 : GPU Frame Buffer BAR – In this example it happens to be 256M, but typically this will be size of the GPU memory (typically 4GB+). This BAR has to be placed < 2^40 to allow peer-to-peer access from other GFX8 AMD GPUs. For GFX9 (Vega GPU) the BAR has to be placed < 2^44 to allow peer-to-peer access from other GFX9 AMD GPUs.
+1 : GPU Frame Buffer BAR -- In this example it happens to be 256M, but typically this will be size of the
+GPU memory (typically 4GB+). This BAR has to be placed \< 2^40 to allow peer-to-peer access from
+other GFX8 AMD GPUs. For GFX9 (Vega GPU) the BAR has to be placed \< 2^44 to allow peer-to-peer
+access from other GFX9 AMD GPUs.

-2 : Doorbell BAR – The size of the BAR is typically will be < 10MB (currently fixed at 2MB) for this generation GPUs. This BAR has to be placed < 2^40 to allow peer-to-peer access from other current generation AMD GPUs.
+2 : Doorbell BAR -- The size of the BAR is typically will be \< 10MB (currently fixed at 2MB) for this
+generation GPUs. This BAR has to be placed \< 2^40 to allow peer-to-peer access from other current
+generation AMD GPUs.

-3 : IO BAR - This is for legacy VGA and boot device support, but since this the GPUs in this project are not VGA devices (headless), this is not a concern even if the SBIOS does not setup.
+3 : IO BAR -- This is for legacy VGA and boot device support, but since this the GPUs in this project are
+not VGA devices (headless), this is not a concern even if the SBIOS does not setup.

-4 : MMIO BAR – This is required for the AMD Driver SW to access the configuration registers. Since the reminder of the BAR available is only 1 DWORD (32bit), this is placed < 4GB. This is fixed at 256KB.
+4 : MMIO BAR -- This is required for the AMD Driver SW to access the configuration registers. Since the
+reminder of the BAR available is only 1 DWORD (32bit), this is placed \< 4GB. This is fixed at 256KB.

-5 : Expansion ROM – This is required for the AMD Driver SW to access the GPU’s video-bios. This is currently fixed at 128KB.
+5 : Expansion ROM -- This is required for the AMD Driver SW to access the GPU video-bios. This is
+currently fixed at 128KB.

-Excerpts from 'Overview of Changes to PCI Express 3.0'
-================================================================
-By Mike Jackson, Senior Staff Architect, MindShare, Inc.
-***************************************************************************************************
-Atomic operations – goal:
-***************************************************************************************************
-Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
-
-  * Fetch and Add – uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back 	  to the original location.
-
-  * Unconditional Swap – uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
-
-  * Compare and Swap – uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it     	against the compare value and, if equal, writes the swap value to the target location.
-
-  * AtomicOpCompletion – new completion to give the result so far atomic request and indicate that the atomicity of the transaction 	has been maintained.
-
-Since atomic operations are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
-
-Atomic operations can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route atomic operations is also indicated in the registers for a given port.
-
-ID-based ordering – goal:
-***************************************************************************************************
-Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
-
-This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
-
-To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
+For more information, you can review
+`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.
--- a/docs/conceptual/cmake-packages.rst
+++ b/docs/conceptual/cmake-packages.rst
@@ -4,31 +4,32 @@ Using CMake

 Most components in ROCm support CMake. Projects depending on header-only or
 library components typically require CMake 3.5 or higher whereas those wanting
-to make use of CMake's HIP language support will require CMake 3.21 or higher.
+to make use of the CMake HIP language support will require CMake 3.21 or higher.

 Finding dependencies
 ====================

 .. note::
-   For a complete
-   reference on how to deal with dependencies in CMake, refer to the CMake docs
-   on `find_package
-   <https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
-   `Using Dependencies Guide
-   <https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
-   to get an overview of CMake's related facilities.
+
+  For a complete
+  reference on how to deal with dependencies in CMake, refer to the CMake docs
+  on `find_package
+  <https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
+  `Using Dependencies Guide
+  <https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
+  to get an overview of CMake related facilities.

 In short, CMake supports finding dependencies in two ways:

 *  In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to
-   find the component in typical install locations and layouts. CMake ships a
-   few dozen such scripts, but users and projects may ship them as well.
+  find the component in typical install locations and layouts. CMake ships a
+  few dozen such scripts, but users and projects may ship them as well.
 *  In Config mode, it locates a file named ``<packagename>-config.cmake`` or
-   ``<PackageName>Config.cmake`` which describes the installed component in all
-   regards needed to consume it.
+  ``<PackageName>Config.cmake`` which describes the installed component in all
+  regards needed to consume it.

 ROCm predominantly relies on Config mode, one notable exception being the Module
-driving the compilation of HIP programs on Nvidia runtimes. As such, when
+driving the compilation of HIP programs on NVIDIA runtimes. As such, when
 dependencies are not found in standard system locations, one either has to
 instruct CMake to search for package config files in additional folders using
 the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of file system
@@ -57,7 +58,7 @@ Using HIP in CMake
 ==================

 ROCm components providing a C/C++ interface support consumption via any
-C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
+C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
 language features, allowing users to program using the HIP single-source
 programming model. When a program (or translation-unit) uses the HIP API without
 compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
@@ -70,22 +71,22 @@ Source code written in the HIP dialect of C++ typically uses the `.hip`
 extension. When the HIP CMake language is enabled, it will automatically
 associate such source files with the HIP toolchain being used.

-::
+..  code-block:: cpp

-    cmake_minimum_required(VERSION 3.21) # HIP language support requires 3.21
-    cmake_policy(VERSION 3.21.3...3.27)
-    project(MyProj LANGUAGES HIP)
-    add_executable(MyApp Main.hip)
+  cmake_minimum_required(VERSION 3.21) # HIP language support requires 3.21
+  cmake_policy(VERSION 3.21.3...3.27)
+  project(MyProj LANGUAGES HIP)
+  add_executable(MyApp Main.hip)

 Should you have existing CUDA code that is from the source compatible subset of
 HIP, you can tell CMake that despite their `.cu` extension, they're HIP sources.
 Do note that this mostly facilitates compiling kernel code-only source files,
 as host-side CUDA API won't compile in this fashion.

-::
+..  code-block:: cpp

-    add_library(MyLib MyLib.cu)
-    set_source_files_properties(MyLib.cu PROPERTIES LANGUAGE HIP)
+  add_library(MyLib MyLib.cu)
+  set_source_files_properties(MyLib.cu PROPERTIES LANGUAGE HIP)

 CMake itself only hosts part of the HIP language support, such as defining
 HIP-specific properties, etc. while the other half ships with the HIP
@@ -110,19 +111,20 @@ Illustrated in the example below is a C++ application using MIOpen from CMake.
 It calls ``find_package(miopen)``, which provides the ``MIOpen`` imported
 target. This can be linked with ``target_link_libraries``

-::
+..  code-block:: cpp

-    cmake_minimum_required(VERSION 3.5) # find_package(miopen) requires 3.5
-    cmake_policy(VERSION 3.5...3.27)
-    project(MyProj LANGUAGES CXX)
-    find_package(miopen)
-    add_library(MyLib ...)
-    target_link_libraries(MyLib PUBLIC MIOpen)
+  cmake_minimum_required(VERSION 3.5) # find_package(miopen) requires 3.5
+  cmake_policy(VERSION 3.5...3.27)
+  project(MyProj LANGUAGES CXX)
+  find_package(miopen)
+  add_library(MyLib ...)
+  target_link_libraries(MyLib PUBLIC MIOpen)

 .. note::
-    Most libraries are designed as host-only API, so using a GPU device
-    compiler is not necessary for downstream projects unless they use GPU device
-    code.
+
+  Most libraries are designed as host-only API, so using a GPU device
+  compiler is not necessary for downstream projects unless they use GPU device
+  code.

 Consuming the HIP API in C++ code
 ---------------------------------
@@ -131,24 +133,25 @@ Use the HIP API without compiling the GPU device code. As there is no GPU code,
 any C or C++ compiler can be used. The ``find_package(hip)`` provides the
 ``hip::host`` imported target to use HIP in this context.

-::
+..  code-block:: cpp

-    cmake_minimum_required(VERSION 3.5) # find_package(hip) requires 3.5
-    cmake_policy(VERSION 3.5...3.27)
-    project(MyProj LANGUAGES CXX)
-    find_package(hip REQUIRED)
-    add_executable(MyApp ...)
-    target_link_libraries(MyApp PRIVATE hip::host)
+  cmake_minimum_required(VERSION 3.5) # find_package(hip) requires 3.5
+  cmake_policy(VERSION 3.5...3.27)
+  project(MyProj LANGUAGES CXX)
+  find_package(hip REQUIRED)
+  add_executable(MyApp ...)
+  target_link_libraries(MyApp PRIVATE hip::host)

 Compiling device code in C++ language mode
 ------------------------------------------

 .. attention::
-    The workflow detailed here is considered legacy and is shown for
-    understanding's sake. It pre-dates the existence of HIP language support in
-    CMake. If source code has HIP device code in it, it is a HIP source file
-    and should be compiled as such. Only resort to the method below if your
-    HIP-enabled CMake codepath can't mandate CMake version 3.21.
+
+  The workflow detailed here is considered legacy and is shown for
+  understanding's sake. It pre-dates the existence of HIP language support in
+  CMake. If source code has HIP device code in it, it is a HIP source file
+  and should be compiled as such. Only resort to the method below if your
+  HIP-enabled CMake codepath can't mandate CMake version 3.21.

 If code uses the HIP API and compiles GPU device code, it requires using a
 device compiler. The compiler for CMake can be set using either the
@@ -160,18 +163,19 @@ compiler that supports AMD GPU targets, which is usually Clang.
 The ``find_package(hip)`` provides the ``hip::device`` imported target to add
 all the flags necessary for device compilation.

-::
+..  code-block:: cpp

-    cmake_minimum_required(VERSION 3.8) # cxx_std_11 requires 3.8
-    cmake_policy(VERSION 3.8...3.27)
-    project(MyProj LANGUAGES CXX)
-    find_package(hip REQUIRED)
-    add_library(MyLib ...)
-    target_link_libraries(MyLib PRIVATE hip::device)
-    target_compile_features(MyLib PRIVATE cxx_std_11)
+  cmake_minimum_required(VERSION 3.8) # cxx_std_11 requires 3.8
+  cmake_policy(VERSION 3.8...3.27)
+  project(MyProj LANGUAGES CXX)
+  find_package(hip REQUIRED)
+  add_library(MyLib ...)
+  target_link_libraries(MyLib PRIVATE hip::device)
+  target_compile_features(MyLib PRIVATE cxx_std_11)

 .. note::
-    Compiling for the GPU device requires at least C++11.
+
+  Compiling for the GPU device requires at least C++11.

 This project can then be configured with for eg.

@@ -252,13 +256,12 @@ options.

 IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
 up with their own way to register command-line fragments of different purpose in
-a setup'n'forget fashion for quick assembly using graphical front-ends. This is
+a setup-and-forget fashion for quick assembly using graphical front-ends. This is
 all nice, but configurations aren't portable, nor can they be reused in
-Continuous Intergration (CI) pipelines. CMake has condensed existing practice
+Continuous Integration (CI) pipelines. CMake has condensed existing practice
 into a portable JSON format that works in all IDEs and can be invoked from any
 command line. This is
-`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_
-.
+`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_.

 There are two types of preset files: one supplied by the project, called
 ``CMakePresets.json`` which is meant to be committed to version control,
@@ -275,109 +278,110 @@ Following is an example ``CMakeUserPresets.json`` file which actually compiles
 the `amd/rocm-examples <https://github.com/amd/rocm-examples>`_ suite of sample
 applications on a typical ROCm installation:

-::
+..  code-block:: json

-    {
-      "version": 3,
-      "cmakeMinimumRequired": {
-        "major": 3,
-        "minor": 21,
-        "patch": 0
+  {
+    "version": 3,
+    "cmakeMinimumRequired": {
+      "major": 3,
+      "minor": 21,
+      "patch": 0
+    },
+    "configurePresets": [
+      {
+        "name": "layout",
+        "hidden": true,
+        "binaryDir": "${sourceDir}/build/${presetName}",
+        "installDir": "${sourceDir}/install/${presetName}"
      },
-      "configurePresets": [
-        {
-          "name": "layout",
-          "hidden": true,
-          "binaryDir": "${sourceDir}/build/${presetName}",
-          "installDir": "${sourceDir}/install/${presetName}"
-        },
-        {
-          "name": "generator-ninja-multi-config",
-          "hidden": true,
-          "generator": "Ninja Multi-Config"
-        },
-        {
-          "name": "toolchain-makefiles-c/c++-amdclang",
-          "hidden": true,
-          "cacheVariables": {
-            "CMAKE_C_COMPILER": "/opt/rocm/bin/amdclang",
-            "CMAKE_CXX_COMPILER": "/opt/rocm/bin/amdclang++",
-            "CMAKE_HIP_COMPILER": "/opt/rocm/bin/amdclang++"
-          }
-        },
-        {
-          "name": "clang-strict-iso-high-warn",
-          "hidden": true,
-          "cacheVariables": {
-            "CMAKE_C_FLAGS": "-Wall -Wextra -pedantic",
-            "CMAKE_CXX_FLAGS": "-Wall -Wextra -pedantic",
-            "CMAKE_HIP_FLAGS": "-Wall -Wextra -pedantic"
-          }
-        },
-        {
-          "name": "ninja-mc-rocm",
-          "displayName": "Ninja Multi-Config ROCm",
-          "inherits": [
-            "layout",
-            "generator-ninja-multi-config",
-            "toolchain-makefiles-c/c++-amdclang",
-            "clang-strict-iso-high-warn"
-          ]
+      {
+        "name": "generator-ninja-multi-config",
+        "hidden": true,
+        "generator": "Ninja Multi-Config"
+      },
+      {
+        "name": "toolchain-makefiles-c/c++-amdclang",
+        "hidden": true,
+        "cacheVariables": {
+          "CMAKE_C_COMPILER": "/opt/rocm/bin/amdclang",
+          "CMAKE_CXX_COMPILER": "/opt/rocm/bin/amdclang++",
+          "CMAKE_HIP_COMPILER": "/opt/rocm/bin/amdclang++"
        }
-      ],
-      "buildPresets": [
-        {
-          "name": "ninja-mc-rocm-debug",
-          "displayName": "Debug",
-          "configuration": "Debug",
-          "configurePreset": "ninja-mc-rocm"
-        },
-        {
-          "name": "ninja-mc-rocm-release",
-          "displayName": "Release",
-          "configuration": "Release",
-          "configurePreset": "ninja-mc-rocm"
-        },
-        {
-          "name": "ninja-mc-rocm-debug-verbose",
-          "displayName": "Debug (verbose)",
-          "configuration": "Debug",
-          "configurePreset": "ninja-mc-rocm",
-          "verbose": true
-        },
-        {
-          "name": "ninja-mc-rocm-release-verbose",
-          "displayName": "Release (verbose)",
-          "configuration": "Release",
-          "configurePreset": "ninja-mc-rocm",
-          "verbose": true
+      },
+      {
+        "name": "clang-strict-iso-high-warn",
+        "hidden": true,
+        "cacheVariables": {
+          "CMAKE_C_FLAGS": "-Wall -Wextra -pedantic",
+          "CMAKE_CXX_FLAGS": "-Wall -Wextra -pedantic",
+          "CMAKE_HIP_FLAGS": "-Wall -Wextra -pedantic"
        }
-      ],
-      "testPresets": [
-        {
-          "name": "ninja-mc-rocm-debug",
-          "displayName": "Debug",
-          "configuration": "Debug",
-          "configurePreset": "ninja-mc-rocm",
-          "execution": {
-            "jobs": 0
-          }
-        },
-        {
-          "name": "ninja-mc-rocm-release",
-          "displayName": "Release",
-          "configuration": "Release",
-          "configurePreset": "ninja-mc-rocm",
-          "execution": {
-            "jobs": 0
-          }
+      },
+      {
+        "name": "ninja-mc-rocm",
+        "displayName": "Ninja Multi-Config ROCm",
+        "inherits": [
+          "layout",
+          "generator-ninja-multi-config",
+          "toolchain-makefiles-c/c++-amdclang",
+          "clang-strict-iso-high-warn"
+        ]
+      }
+    ],
+    "buildPresets": [
+      {
+        "name": "ninja-mc-rocm-debug",
+        "displayName": "Debug",
+        "configuration": "Debug",
+        "configurePreset": "ninja-mc-rocm"
+      },
+      {
+        "name": "ninja-mc-rocm-release",
+        "displayName": "Release",
+        "configuration": "Release",
+        "configurePreset": "ninja-mc-rocm"
+      },
+      {
+        "name": "ninja-mc-rocm-debug-verbose",
+        "displayName": "Debug (verbose)",
+        "configuration": "Debug",
+        "configurePreset": "ninja-mc-rocm",
+        "verbose": true
+      },
+      {
+        "name": "ninja-mc-rocm-release-verbose",
+        "displayName": "Release (verbose)",
+        "configuration": "Release",
+        "configurePreset": "ninja-mc-rocm",
+        "verbose": true
+      }
+    ],
+    "testPresets": [
+      {
+        "name": "ninja-mc-rocm-debug",
+        "displayName": "Debug",
+        "configuration": "Debug",
+        "configurePreset": "ninja-mc-rocm",
+        "execution": {
+          "jobs": 0
        }
-      ]
-    }
+      },
+      {
+        "name": "ninja-mc-rocm-release",
+        "displayName": "Release",
+        "configuration": "Release",
+        "configurePreset": "ninja-mc-rocm",
+        "execution": {
+          "jobs": 0
+        }
+      }
+    ]
+  }

 .. note::
-    Getting presets to work reliably on Windows requires some CMake improvements
-    and/or support from compiler vendors. (Refer to
-    `Add support to the Visual Studio generators <https://gitlab.kitware.com/cmake/cmake/-/issues/24245>`_
-    and `Sourcing environment scripts <https://gitlab.kitware.com/cmake/cmake/-/issues/21619>`_
-    .)
+
+  Getting presets to work reliably on Windows requires some CMake improvements
+  and/or support from compiler vendors. (Refer to
+  `Add support to the Visual Studio generators <https://gitlab.kitware.com/cmake/cmake/-/issues/24245>`_
+  and `Sourcing environment scripts <https://gitlab.kitware.com/cmake/cmake/-/issues/21619>`_
+  .)
--- a/docs/install/pytorch-install.md
+++ b/docs/install/pytorch-install.md
@@ -326,7 +326,7 @@ maintainers and installs all the required dependencies, including:
 ## Testing the PyTorch installation

 You can use PyTorch unit tests to validate your PyTorch installation. If you used a
-**prebuilt PyTorch Docker image from AMD ROCm DockerHub** or installed an
+**prebuilt PyTorch Docker image from AMD ROCm Docker Hub** or installed an
 **official wheels package**, validation tests are not necessary.

 If you want to manually run unit tests to validate your PyTorch installation fully, follow these steps: