site restructure phase 1 - file reorganization (#2428)

2026-01-08 22:28:06 -05:00 · 2023-09-08 10:02:17 -06:00
parent 3535c43d4e
commit 890c735f53
179 changed files with 1377 additions and 813 deletions
--- a/.markdownlint-cli2.yaml
+++ b/.markdownlint-cli2.yaml
@@ -8,7 +8,9 @@ config:
  MD033: false
  MD034: false
  MD041: false
+  MD051: false
 ignores:
  - CHANGELOG.md
  - "{,docs/}{RELEASE,release}.md"
+  - docs/about/release/release_notes.md
  - tools/autotag/templates/**/*.md
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -1,3 +1,7 @@
+ROCProfiler
+ROCTracer
+ROCdbgapi
+hipify
 # building
 matchers
 # file_reorg
--- a/README.md
+++ b/README.md
@@ -1,23 +1,23 @@
 # AMD ROCm™ Platform

-ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for
-graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development
-tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
+ROCm is an open-source stack, composed primarily of open-source software, designed for graphics
+processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and
+APIs that enable GPU programming from low-level kernel to end-user applications.

 With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
-collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software
+collaborate, test, and deploy your applications in a free, open source, integrated, and secure software
 ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
 artificial intelligence (AI), scientific computing, and computer aided design (CAD).

 ROCm is powered by AMD’s
 [Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
-an OSS C++ GPU programming environment and its corresponding runtime. HIP allows ROCm
-developers to create portable applications on different platforms by deploying code on a range of
-platforms, from dedicated gaming GPUs to exascale HPC clusters.
+an open-source software C++ GPU programming environment and its corresponding runtime. HIP
+allows ROCm developers to create portable applications on different platforms by deploying code on a
+range of platforms, from dedicated gaming GPUs to exascale HPC clusters.

-ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary OSS
-compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks,
-such as PyTorch and TensorFlow.
+ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open
+source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
+(ML) frameworks, such as PyTorch and TensorFlow.

 ## ROCm Documentation

@@ -47,4 +47,4 @@ python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
 ## Older ROCm Releases

 For release information for older ROCm releases, refer to
-[`CHANGELOG.md`](./CHANGELOG.md).
+[`CHANGELOG`](./CHANGELOG).
--- a/docs/about/compatibility/3rd_party_support_matrix.md
+++ b/docs/about/compatibility/3rd_party_support_matrix.md
@@ -1,11 +1,9 @@
-# 3rd Party Support Matrix
+# Third party support matrix

 ROCm™ supports various 3rd party libraries and frameworks. Supported versions
 are tested and known to work. Non-supported versions of 3rd parties may also
 work, but aren't tested.

-(ml_framework_compat_matrix)=
-
 ## Deep Learning

 ROCm releases support the most recent and two prior releases of PyTorch and
@@ -21,6 +19,8 @@ TensorFlow
 | 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11           | 2.5.4 |
 | 5.6   | 1.11, 1.12.1, 1.13.1       | 2.12                 | 2.5.4 |

+(communication_libraries)=
+
 ## Communication libraries

 ROCm supports [OpenUCX](https://openucx.org/) an "an open-source,
@@ -59,4 +59,4 @@ contemporary CUDA / NVIDIA HPC SDK alternatives.
 | 5.6   | 1.17.2       | 22.9       |

 For the latest documentation of these libraries, refer to the
-[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md).
+[associated documentation](../../reference/libraries/gpu_libraries/c++_primitives).
--- a/docs/about/compatibility/docker_image_support_matrix.md
+++ b/docs/about/compatibility/docker_image_support_matrix.md
--- a/docs/about/compatibility/index.md
+++ b/docs/about/compatibility/index.md
@@ -7,14 +7,14 @@
 Forward and backward compatibility of ROCm user space components and the
 kernel space Kernel Fusion Driver (KFD).

- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix.md)
+- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix)

 :::

 :::{grid-item-card} Docker Image Support
 ROCm releases several Docker container images.

- [Docker Image Support Matrix](./docker_image_support_matrix.md)
+- [Docker Image Support Matrix](./docker_image_support_matrix)

 :::

@@ -22,7 +22,7 @@ ROCm releases several Docker container images.
 Several 3rd party libraries ship with ROCm enablement as well as several ROCm
 components provide interfaces compatible with 3rd party solutions.

- [3rd Party Support Matrix](./3rd_party_support_matrix.md)
+- [Third party support matrix](./3rd_party_support_matrix)

 :::

--- a/docs/about/compatibility/user_kernel_space_compat_matrix.md
+++ b/docs/about/compatibility/user_kernel_space_compat_matrix.md
--- a/docs/about/license.md
+++ b/docs/about/license.md
@@ -0,0 +1,9 @@
+# License
+
+> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that primarily contains documentation. For other licensing information, refer to the [Licensing Terms page](./licensing).
+
+```{include} ../../LICENSE
+```
+
+```{include} ./licensing.md
+```
--- a/docs/release/licensing.md
+++ b/docs/release/licensing.md
@@ -1,4 +1,4 @@
-# Licensing Terms
+# ROCm licensing terms

 ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately.
 The following table is a list of ROCm components with links to their respective license
--- a/docs/about/release/linux_support.md
+++ b/docs/about/release/linux_support.md
@@ -1,6 +1,6 @@
-# GPU Support and OS Compatibility (Linux)
+# GPU and OS support (Linux)

-(supported_distributions)=
+(linux_support)=

 ## Supported Linux Distributions

--- a/docs/about/release/release_history.md
+++ b/docs/about/release/release_history.md
--- a/docs/about/release/release_notes.md
+++ b/docs/about/release/release_notes.md
@@ -0,0 +1,583 @@
+# Release Notes
+<!-- Do not edit this file! This file is autogenerated with -->
+<!--   tools/autotag/tag_script.py                          -->
+
+<!-- Disable lints since this is an auto-generated file.    -->
+<!-- markdownlint-disable blanks-around-headers             -->
+<!-- markdownlint-disable no-duplicate-header               -->
+<!-- markdownlint-disable no-blanks-blockquote              -->
+<!-- markdownlint-disable ul-indent                         -->
+<!-- markdownlint-disable no-trailing-spaces                -->
+<!-- markdownlint-disable commands-show-output              -->
+
+<!-- spellcheck-disable -->
+
+The release notes for the ROCm platform.
+
+-------------------
+
+## ROCm 5.6.0
+<!-- markdownlint-disable first-line-h1 -->
+<!-- markdownlint-disable no-duplicate-header -->
+<!-- markdownlint-disable header-increment -->
+#### Release Highlights
+
+ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
+
+- New documentation portal at https://rocm.docs.amd.com
+- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
+- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
+- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
+- New pseudorandom generators are available in rocRAND.  Added support for half-precision transforms in hipFFT/rocFFT.  Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.  
+
+#### OS and GPU Support Changes
+
+- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
+- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
+  - No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
+  - Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
+  - Bug fixes during the maintenance will be made to the next ROCm point release
+  - Bug fixes will not be back ported to older ROCm releases for this SKU
+  - Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
+
+#### AMDSMI CLI 23.0.0.4
+
+##### Added
+
+- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
+
+- Package: amd-smi-lib
+ 
+##### Known Issues
+
+- not all Error Correction Code (ECC) fields are currently supported
+
+- RHEL 8 & SLES 15 have extra install steps
+
+#### Kernel Modules (DKMS)
+
+##### Fixes
+
+- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
+
+#### HIP 5.6 (For ROCm 5.6)
+
+##### Optimizations
+
+- Consolidation of hipamd, rocclr and OpenCL projects in clr
+- Optimized lock for graph global capture mode
+
+##### Added
+
+- Added hipRTC support for amd_hip_fp16
+- Added hipStreamGetDevice implementation to get the device associated with the stream
+- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
+- hipArrayGetInfo for getting information about the specified array
+- hipArrayGetDescriptor for getting 1D or 2D array descriptor
+- hipArray3DGetDescriptor to get 3D array descriptor
+
+##### Changed
+
+- hipMallocAsync to return success for zero size allocation to match hipMalloc
+- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
+- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
+- Removed hipBusBandwidth and hipCommander samples from hip-tests
+
+##### Fixed
+
+- Fixed regression in hipMemCpyParam3D when offset is applied
+
+##### Known Issues
+
+- Limited testing on xnack+ configuration
+  - Multiple HIP tests failures (gpuvm fault or hangs)
+- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
+- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
+
+##### Upcoming changes in future release
+
+- Removal of gcnarch from hipDeviceProp_t structure
+- Addition of new fields in hipDeviceProp_t structure
+  - maxTexture1D
+  - maxTexture2D
+  - maxTexture1DLayered
+  - maxTexture2DLayered
+  - sharedMemPerMultiprocessor
+  - deviceOverlap
+  - asyncEngineCount
+  - surfaceAlignment
+  - unifiedAddressing
+  - computePreemptionSupported
+  - uuid
+- Removal of deprecated code
+  - hip-hcc codes from hip code tree
+- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
+- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
+- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
+
+#### ROCgdb-13 (For ROCm 5.6.0)
+
+##### Optimized
+
+- Improved performances when handling the end of a process with a large number of threads.
+
+Known Issues
+
+- On certain configurations, ROCgdb can show the following warning message:
+
+  `warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
+
+  This does not affect ROCgdb's functionalities.
+
+#### ROCprofiler (For ROCm 5.6.0)
+
+In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
+ROCm 5.5 are split into separate files. The `rocmtools` files that were
+deprecated in ROCm 5.5 have been removed.
+
+  | ROCm 5.6        | rocprofilerv1                       | rocprofilerv2                          |
+  |-----------------|-------------------------------------|----------------------------------------|
+  | **Tool script** | `bin/rocprof`                       | `bin/rocprofv2`                        |
+  | **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
+  | **API library** | `lib/librocprofiler.so.1`           | `lib/librocprofiler.so.2`              |
+
+The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
+following command:
+
+```sh
+$ rocprof …
+```
+
+To write a custom tool based on the `rocprofilerV1` API do the following:
+
+```C
+main.c:
+#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
+int main() {
+  // Use the rocprofilerV1 API
+  return 0;
+}
+```
+
+This can be built in the following manner:
+
+```sh
+$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
+```
+
+The resulting `a.out` will depend on
+`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
+
+The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
+following command:
+
+```sh
+$ rocprofv2 …
+```
+
+To write a custom tool based on the `rocprofilerV2` API do the following:
+
+```C
+main.c:
+#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
+int main() {
+  // Use the rocprofilerV2 API
+  return 0;
+}
+```
+
+This can be built in the following manner:
+
+```sh
+$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
+```
+
+The resulting `a.out` will depend on
+`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
+
+##### Optimized
+
+- Improved Test Suite
+
+##### Added
+
+- 'end_time' need to be disabled in roctx_trace.txt
+
+##### Fixed
+
+- rocprof in ROcm/5.4.0 gpu selector broken.
+- rocprof in ROCm/5.4.1 fails to generate kernel info.
+- rocprof clobbers LD_PRELOAD.
+
+### Library Changes in ROCM 5.6.0
+
+| Library | Version |
+|---------|---------|
+| hipBLAS |  ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
+| hipCUB |  ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
+| hipFFT |  ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
+| hipSOLVER |  ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
+| hipSPARSE |  ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
+| MIOpen |  ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
+| rccl |  ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
+| rocALUTION |  ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
+| rocBLAS |  ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
+| rocFFT |  ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
+| rocm-cmake |  ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
+| rocPRIM |  ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
+| rocRAND |  ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
+| rocSOLVER |  ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
+| rocSPARSE |  ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
+| rocThrust |  ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
+| rocWMMA |  ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
+| Tensile |  ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
+
+#### hipBLAS 1.0.0
+
+hipBLAS 1.0.0 for ROCm 5.6.0
+
+##### Changed
+
+- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
+
+##### Removed
+
+- removed support for deprecated hipblasInt8Datatype_t enum
+- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
+
+##### Deprecated
+
+- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
+  out-of-place functionality
+
+#### hipCUB 2.13.1
+
+hipCUB 2.13.1 for ROCm 5.6.0
+
+##### Added
+
+- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
+
+##### Changed
+
+- CUB backend references CUB and Thrust version 1.17.2.
+- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
+- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
+
+##### Known Issues
+
+- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
+- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
+
+#### hipFFT 1.0.12
+
+hipFFT 1.0.12 for ROCm 5.6.0
+
+##### Added
+
+- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
+
+##### Changed
+
+- Added --precision argument to benchmark/test clients.  --double is still accepted but is deprecated as a method to request a double-precision transform.
+
+#### hipSOLVER 1.8.0
+
+hipSOLVER 1.8.0 for ROCm 5.6.0
+
+##### Added
+
+- Added compatibility API with hipsolverRf prefix
+
+#### hipSPARSE 2.3.6
+
+hipSPARSE 2.3.6 for ROCm 5.6.0
+
+##### Added
+
+- Added SpGEMM algorithms
+
+##### Changed
+
+- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
+
+#### MIOpen 2.19.0
+
+MIOpen 2.19.0 for ROCm 5.6.0
+
+##### Added
+
+- ROCm 5.5 support for gfx1101 (Navi32)
+
+##### Changed
+
+- Tuning results for MLIR on ROCm 5.5
+- Bumping MLIR commit to 5.5.0 release tag
+
+##### Fixed
+
+- Fix 3d convolution Host API bug
+- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
+
+#### rccl 2.15.5
+
+RCCL 2.15.5 for ROCm 5.6.0
+
+##### Changed
+
+- Compatibility with NCCL 2.15.5
+- Unit test executable renamed to rccl-UnitTests
+
+##### Added
+
+- HW-topology aware binary tree implementation
+- Experimental support for MSCCL
+- New unit tests for hipGraph support
+- NPKit integration
+
+##### Fixed
+
+- rocm-smi ID conversion
+- Support for HIP_VISIBLE_DEVICES for unit tests
+- Support for p2p transfers to non (HIP) visible devices
+
+##### Removed
+
+- Removed TransferBench from tools.  Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
+
+#### rocALUTION 2.1.9
+
+rocALUTION 2.1.9 for ROCm 5.6.0
+
+##### Improved
+
+- Fixed synchronization issues in level 1 routines
+
+#### rocBLAS 3.0.0
+
+rocBLAS 3.0.0 for ROCm 5.6.0
+
+##### Optimizations
+
+- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n &lt;= 32 and batch_count &gt;= 256.
+- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
+
+##### Added
+
+- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
+
+##### Deprecated
+
+- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
+- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
+- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
+- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
+- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
+
+##### Removed
+
+- is_complex helper was deprecated and now removed.  Use rocblas_is_complex instead.
+- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
+- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
+- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
+
+##### Dependencies
+
+- build only dependency on python joblib added as used by Tensile build
+- fix for cmake install on some OS when performed by install.sh -d --cmake_install
+
+##### Fixed
+
+- make trsm offset calculations 64 bit safe
+
+##### Changed
+
+- refactor rotg test code
+
+#### rocFFT 1.0.23
+
+rocFFT 1.0.23 for ROCm 5.6.0
+
+##### Added
+
+- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
+- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
+- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
+
+##### Changed
+
+- Replaced std::complex with hipComplex data types for data generator.
+- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
+- Added --precision argument to benchmark/test clients.  --double is still accepted but is deprecated as a method to request a double-precision transform.
+
+##### Fixed
+
+- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
+
+#### rocm-cmake 0.9.0
+
+rocm-cmake 0.9.0 for ROCm 5.6.0
+
+##### Added
+
+- Added the option ROCM_HEADER_WRAPPER_WERROR
+    - Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
+    - Configure-time CMake option sets the default for the C macro.
+
+#### rocPRIM 2.13.0
+
+rocPRIM 2.13.0 for ROCm 5.6.0
+
+##### Added
+
+- New block level `radix_rank` primitive.
+- New block level `radix_rank_match` primitive.
+- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
+
+##### Changed
+
+- Improved the performance of `block_radix_sort` and `device_radix_sort`.
+- Improved the performance of `device_merge_sort`.
+- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
+
+##### Known Issues
+
+- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
+- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits&lt;unsigned int&gt;::max()`.
+
+#### rocRAND 2.10.17
+
+rocRAND 2.10.17 for ROCm 5.6.0
+
+##### Added
+
+- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
+- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
+- experimental HIP-CPU feature
+- ThreeFry pseudorandom number generator based on Salmon et al., 2011, &#34;Parallel random numbers: as easy as 1, 2, 3&#34;.
+
+##### Changed
+
+- Python 2.7 is no longer officially supported.
+
+#### rocSOLVER 3.22.0
+
+rocSOLVER 3.22.0 for ROCm 5.6.0
+
+##### Added
+
+- LU refactorization for sparse matrices
+    - CSRRF_ANALYSIS
+    - CSRRF_SUMLU
+    - CSRRF_SPLITLU
+    - CSRRF_REFACTLU
+- Linear system solver for sparse matrices
+    - CSRRF_SOLVE
+- Added type `rocsolver_rfinfo` for use with sparse matrix routines
+
+##### Optimized
+
+- Improved the performance of BDSQR and GESVD when singular vectors are requested
+
+##### Fixed
+
+- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
+
+#### rocSPARSE 2.5.2
+
+rocSPARSE 2.5.2 for ROCm 5.6.0
+
+##### Improved
+
+- Fixed a memory leak in csritsv
+- Fixed a bug in csrsm and bsrsm
+
+#### rocThrust 2.18.0
+
+rocThrust 2.18.0 for ROCm 5.6.0
+
+##### Fixed 
+
+- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
+
+##### Changed
+
+- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
+
+#### rocWMMA 1.1.0
+
+rocWMMA 1.1.0 for ROCm 5.6.0
+
+##### Added
+
+- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
+- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
+- Added performance gemm samples for half, single and double precision
+- Added rocWMMA cmake versioning
+- Added vectorized support in coordinate transforms
+- Included ROCm smi for runtime clock rate detection
+- Added fragment transforms for transpose and change data layout
+
+##### Changed
+
+- Default to GPU rocBLAS validation against rocWMMA
+- Re-enabled int8 gemm tests on gfx9
+- Upgraded to C++17
+- Restructured unit test folder for consistency
+- Consolidated rocWMMA samples common code
+
+#### Tensile 4.37.0
+
+Tensile 4.37.0 for ROCm 5.6.0
+
+##### Added
+
+- Added user driven tuning API
+- Added decision tree fallback feature
+- Added SingleBuffer + AtomicAdd option for GlobalSplitU
+- DirectToVgpr support for fp16 and Int8 with TN orientation
+- Added new test cases for various functions
+- Added SingleBuffer algorithm for ZGEMM/CGEMM
+- Added joblib for parallel map calls
+- Added support for MFMA + LocalSplitU + DirectToVgprA+B
+- Added asmcap check for MIArchVgpr
+- Added support for MFMA + LocalSplitU
+- Added frequency, power, and temperature data to the output
+
+##### Optimizations
+
+- Improved the performance of GlobalSplitU with SingleBuffer algorithm
+- Reduced the running time of the extended and pre_checkin tests
+- Optimized the Tailloop section of the assembly kernel
+- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
+- Improved the performance of the second kernel of MultipleBuffer algorithm
+
+##### Changed
+
+- Updated custom kernels with 64-bit offsets
+- Adapted 64-bit offset arguments for assembly kernels
+- Improved temporary register re-use to reduce max sgpr usage
+- Removed some restrictions on VectorWidth and DirectToVgpr
+- Updated the dependency requirements for Tensile
+- Changed the range of AssertSummationElementMultiple
+- Modified the error messages for more clarity
+- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
+- Removed dummy vgpr for vectorStaticRemainder
+- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
+- Removed qReg parameter from vectorStaticRemainder
+
+##### Fixed
+
+- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
+- 64-bit offset parameters for post kernels
+- Fixed gfx908 CI test failures
+- Fixed offset calculation to prevent overflow for large offsets
+- Fixed issues when BufferLoad and BufferStore are equal to zero
+- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
+- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
+- Fixed the memory access error related to StaggerU + large stride
+- Fixed ZGEMM 4x4 MatrixInst mismatch
+- Fixed DGEMM 4x4 MatrixInst mismatch
+- Fixed ASEM + GSU + NoTailLoop opt mismatch
+- Fixed AssertSummationElementMultiple + GlobalSplitU issues
+- Fixed ASEM + GSU + TailLoop inner unroll
--- a/docs/about/release/windows_support.md
+++ b/docs/about/release/windows_support.md
@@ -59,15 +59,12 @@ on this table, the GPU is not officially supported by AMD.

 ### Component Support

-ROCm components are described in the [reference](../reference/all) page. Support
+ROCm components are described in the [Reference material](../../reference/index). Support
 on Windows is provided with two levels on enablement.

- **Runtime**: Runtime enables the use of the HIP/OpenCL runtimes only.
- **HIP SDK**: Runtime plus additional components refer to libraries found under
-  [Math Libraries](../reference/gpu_libraries/math.md) and
-  [C++ Primitive Libraries](../reference/gpu_libraries/c%2B%2B_primitives.md).
-  Some [Math Libraries](../reference/gpu_libraries/math.md) are Linux exclusive,
-  please check the library details.
+- **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only.
+- **HIP SDK**: Runtime plus additional components refer to [Libraries](../../reference/libraries/index).
+  Some [math libraries](../../reference/libraries/gpu_libraries/math) are Linux exclusive, please check the library details.

 ### Support Status

--- a/docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst
+++ b/docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst
--- a/docs/conceptual/cmake_packages.rst
+++ b/docs/conceptual/cmake_packages.rst
@@ -50,7 +50,7 @@ the *config-file* packages are shipped with the upstream projects, such as
 rocPRIM and other ROCm libraries.

 For a complete guide on where and how ROCm may be installed on a system, refer
-to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_).
+to the installation guides in these docs (`Linux <../tutorials/install/index.html>`_).

 Using HIP in CMake
 ==================
--- a/docs/conceptual/compiler_disambiguation.md
+++ b/docs/conceptual/compiler_disambiguation.md
--- a/docs/conceptual/file_reorg.md
+++ b/docs/conceptual/file_reorg.md
--- a/docs/conceptual/gpu_arch.md
+++ b/docs/conceptual/gpu_arch.md
--- a/docs/conceptual/gpu_arch/mi100.md
+++ b/docs/conceptual/gpu_arch/mi100.md
@@ -6,7 +6,7 @@ these GPUs.

 ## System Architecture

-{numref}`mi100-arch` shows the node-level architecture of a system that
+The following image shows the node-level architecture of a system that
 comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators.
 The two EPYC processors are connected to each other with the AMD Infinity™
 fabric which provides a high-bandwidth (up to 18 GT/sec) and coherent links such
@@ -17,12 +17,13 @@ available to connect the processors plus one PCIe Gen 4 x16 link per processor
 can attach additional I/O devices such as the host adapters for the network
 fabric.

-:::{figure-md} mi100-arch
-
-<img src="../../data/reference/gpu_arch/image.004.png" alt="Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.">
+```{figure} ../../data/conceptual/gpu_arch/image004.png
+:name: mi100-arch
+:alt: Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.
+:align: center

 Structure of a single GCD in the AMD Instinct MI100 accelerator.
-:::
+```

 In a typical node configuration, each processor can host up to four AMD
 Instinct™ accelerators that are attached using PCIe Gen 4 links at 16 GT/sec,
@@ -42,18 +43,19 @@ computing (HPC) and AI & machine learning (ML) that run on everything from
 individual servers to the world's largest exascale supercomputers. The overall
 system architecture is designed for extreme scalability and compute performance.

-:::{figure-md} mi100-block
-
-<img src="../../data/reference/gpu_arch/image.005.png" alt="Structure of the AMD Instinct accelerator (MI100 generation).">
+```{figure} ../../data/conceptual/gpu_arch/image005.png
+:name: mi100-block
+:alt: Structure of the AMD Instinct accelerator (MI100 generation).
+:align: center

 Structure of the AMD Instinct accelerator (MI100 generation).
-:::
+```

-{numref}`mi100-block` shows the AMD Instinct accelerator with its PCIe Gen 4 x16
+The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16
 link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host
 processor(s). It also shows the three AMD Infinity Fabric ports that provide
 high-speed links (23 GT/sec, also at the bottom) to the other GPUs of the local
-hive as shown in {numref}`mi100-arch`.
+hive.

 On the left and right of the floor plan, the High Bandwidth Memory (HBM)
 attaches via the GPU's memory controller.  The MI100 generation of the AMD
@@ -61,7 +63,7 @@ Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total
 of 32GB with a 4,096bit-wide memory interface. The peak memory bandwidth of the
 attached HBM2 is 1.228 TB/sec at a memory clock frequency of 1.2 GHz.

-The execution units of the GPU are depicted in {numref}`mi100-block` as Compute
+The execution units of the GPU are depicted in the above image as Compute
 Units (CU). There are a total 120 compute units that are physically organized
 into eight Shader Engines (SE) with fifteen compute units per shader engine.
 Each compute unit is further sub-divided into four SIMD units that process SIMD
@@ -70,15 +72,16 @@ instructions of 16 data elements per instruction. This enables the CU to process
 Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS
 (`4 [SIMD units] x 16 [elements per instruction] x 120 [CU] x 1.5 [GHz]`).

-:::{figure-md} mi100-gcd
-
-<img src="../../data/reference/gpu_arch/image.006.png" alt="Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture">
+```{figure} ../../data/conceptual/gpu_arch/image006.png
+:name: mi100-gcd
+:alt: Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture.
+:align: center

 Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA
-architecture
-:::
+architecture.
+```

-{numref}`mi100-gcd` shows the block diagram of a single CU of an AMD Instinct™
+The preceding image shows the block diagram of a single CU of an AMD Instinct™
 MI100 accelerator and summarizes how instructions flow through the execution
 engines. The CU fetches the instructions via a 32KB instruction cache and moves
 them forward to execution via a dispatcher. The CU can handle up to ten
--- a/docs/conceptual/gpu_arch/mi200_performance_counters.md
+++ b/docs/conceptual/gpu_arch/mi200_performance_counters.md
--- a/docs/conceptual/gpu_arch/mi250.md
+++ b/docs/conceptual/gpu_arch/mi250.md
@@ -12,8 +12,7 @@ everything from individual servers to the world’s largest exascale
 supercomputers. The overall system architecture is designed for extreme
 scalability and compute performance.

-{numref}`mi250-gcd` shows the components of a single Graphics Compute Die (GCD
-) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™
+The following image shows the components of a single Graphics Compute Die (GCD) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™
 interfaces and their physical links that are used to connect the GPU die to the
 other system-level components of the node (see also Section 2.2). Both
 interfaces can drive four AMD Infinity Fabric links. One of the AMD Infinity
@@ -28,7 +27,7 @@ To the left and the right are memory controllers that attach the High Bandwidth
 Memory (HBM) modules to the GCD. AMD Instinct MI250 GPUs use HBM2e, which offers
 a peak memory bandwidth of 1.6 TB/sec per GCD.

-The execution units of the GPU are depicted in {numref}`mi250-gcd` as Compute
+The execution units of the GPU are depicted in the following image as Compute
 Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
 subdivided into four SIMD units that process SIMD instructions of 16 data
 elements per instruction (for the FP64 data type). This enables the CU to
@@ -39,16 +38,17 @@ execution units (also called matrix cores), which are geared toward executing
 matrix operations like matrix-matrix multiplications. For FP64, the peak
 performance of these units amounts to 90.5 TFLOPS.

-:::{figure-md} mi250-gcd
+```{figure} ../../data/conceptual/gpu_arch/image001.png
+:name: mi250-gcd
+:alt: Structure of a single GCD in the AMD Instinct MI250 accelerator.
+:align: center

-<img src="../../data/reference/gpu_arch/image.001.png" alt="Structure of a single GCD in the AMD Instinct MI250 accelerator.">
-
-Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator.
-:::
+Structure of a single GCD in the AMD Instinct MI250 accelerator.
+```

 ```{list-table} Peak-performance capabilities of the MI250 OAM for different data types.
 :header-rows: 1
-:name: mi250-perf
+:name: mi250-perf-table

 *
  - Computation and Data Type
@@ -88,7 +88,7 @@ Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator.
  - 362.1
 ```

-{numref}`mi250-perf` summarizes the aggregated peak performance of the AMD
+The above table summarizes the aggregated peak performance of the AMD
 Instinct MI250 OCP Open Accelerator Modules (OAM, OCP is short for Open Compute
 Platform) and its two GCDs for different data types and execution units. The
 middle column lists the peak performance (number of data elements processed in a
@@ -97,14 +97,15 @@ is being retired in each clock cycle. The third column lists the theoretical
 peak performance of the OAM module. The theoretical aggregated peak memory
 bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).

-:::{figure-md} mi250-arch
-
-<img src="../../data/reference/gpu_arch/image.002.png" alt="Dual-GCD architecture of the AMD Instinct MI250 accelerators.">
+```{figure} ../../data/conceptual/gpu_arch/image002.png
+:name: mi250-perf
+:alt: Dual-GCD architecture of the AMD Instinct MI250 accelerators..
+:align: center

 Dual-GCD architecture of the AMD Instinct MI250 accelerators.
-:::
+```

-{numref}`mi250-arch` shows the block diagram of an OAM package that consists
+The following image shows the block diagram of an OAM package that consists
 of two GCDs, each of which constitutes one GPU device in the system. The two
 GCDs in the package are connected via four AMD Infinity Fabric links running at
 a theoretical peak rate of 25 GT/sec, giving 200 GB/sec peak transfer bandwidth
@@ -113,7 +114,7 @@ between the two GCDs of an OAM, or a bidirectional peak transfer bandwidth of

 ## Node-level Architecture

-{numref}`mi250-block` shows the node-level architecture of a system that is
+The following image shows the node-level architecture of a system that is
 based on the AMD Instinct MI250 accelerator. The MI250 OAMs attach to the host
 system via PCIe Gen 4 x16 links (yellow lines). Each GCD maintains its own PCIe
 x16 link to the host part of the system. Depending on the server platform, the
@@ -121,15 +122,16 @@ GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch
 . Note that some platforms may offer an x8 interface to the GCDs, which reduces
 the available host-to-GPU bandwidth.

-:::{figure-md} mi250-block
-
-<img src="../../data/reference/gpu_arch/image.003.png" alt="Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor.">
+```{figure} ../../data/conceptual/gpu_arch/image003.png
+:name: mi250-block
+:alt: Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor.
+:align: center

 Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation
 AMD EPYC processor.
-:::
+```

-{numref}`mi250-block` shows the node-level architecture of a system with AMD
+The preceding image shows the node-level architecture of a system with AMD
 EPYC processors in a dual-socket configuration and four AMD Instinct MI250
 accelerators. The MI250 OAMs attach to the host processors system via PCIe Gen 4
 x16 links (yellow lines). Depending on the system design, a PCIe switch may
@@ -146,4 +148,4 @@ two GPU dies in the MI250 OAM and operates at 25 GT/sec, which corresponds to a
 theoretical peak transfer rate of 50 GB/sec per link (or 100 GB/sec
 bidirectional peak transfer bandwidth). The GCD pairs 2 and 6 as well as GCDs 0
 and 4 connect via two XGMI links, which is indicated by the thicker red line in
-{numref}`mi250-block`.
+the preceding image.
--- a/docs/conceptual/gpu_isolation.md
+++ b/docs/conceptual/gpu_isolation.md
--- a/docs/conceptual/index.md
+++ b/docs/conceptual/index.md
@@ -1,10 +1,10 @@
-# All Explanation Material
+# Conceptual documentation

 :::::{grid} 1 1 2 2
 :gutter: 1

 :::{grid-item-card} Compiler Nomencalture
-:link: compiler_disambiguation
+:link: ./compiler_disambiguation
 :link-type: doc
 ROCm ships multiple compilers of varying origins and purposes. This article
 disambiguates compiler naming used throughout the documentation.
@@ -12,7 +12,7 @@ disambiguates compiler naming used throughout the documentation.
 :::

 :::{grid-item-card} Using CMake
-:link: cmake_packages
+:link: ./cmake_packages
 :link-type: doc
 ROCm components ship with 1st party CMake support. This article details how that
 support works and how to use it.
@@ -20,7 +20,7 @@ support works and how to use it.
 :::

 :::{grid-item-card} Linux Folder Structure Reorganization
-:link: file_reorg
+:link: ./file_reorg
 :link-type: doc
 ROCm™ packages have adopted the Linux foundation file system hierarchy standard
 to ensure ROCm components follow open source conventions for Linux-based
@@ -29,7 +29,7 @@ distributions.
 :::

 :::{grid-item-card} GPU Isolation Techniques
-:link: gpu_isolation
+:link: ./gpu_isolation
 :link-type: doc
 Restricting the access of applications to a subset of GPUs, aka isolating GPUs
 allows users to hide GPU resources from programs.
@@ -37,7 +37,7 @@ allows users to hide GPU resources from programs.
 :::

 :::{grid-item-card} GPU Architectures
-:link: gpu_arch
+:link: ./gpu_arch
 :link-type: doc
 AMD documentation around architectural details from both the CDNA and RDNA
 product lines.
--- a/docs/conceptual/using_gpu_sanitizer.md
+++ b/docs/conceptual/using_gpu_sanitizer.md
@@ -1,4 +1,4 @@
-### Using the LLVM Address Sanitizer (ASAN) on the GPU
+# Using the LLVM Address Sanitizer (ASAN) on the GPU

 The LLVM Address Sanitizer provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.

@@ -7,7 +7,7 @@ Until now, the LLVM Address Sanitizer process was only available for traditional
 This document provides documentation on using ROCm Address Sanitizer.
 For information about LLVM Address Sanitizer, see [the LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).

-### Compile for Address Sanitizer
+## Compile for Address Sanitizer

 The address sanitizer process begins by compiling the application of interest with the address sanitizer instrumentation.

@@ -23,7 +23,7 @@ Other architectures are allowed, but their device code will not be instrumented

 It is not an error to compile some files without address sanitizer instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the Address Sanitizer runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.

-#### About Compilation Time
+### About Compilation Time

 When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.

@@ -33,7 +33,7 @@ There are a few options if the compile time becomes unacceptable:
 + Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
 + Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.

-### Use AMD Supplied Address Sanitizer Instrumented Libraries
+## Use AMD Supplied Address Sanitizer Instrumented Libraries

 ROCm releases provide optional packages containing address sanitizer instrumented builds of a subset of those ROCm libraries usually found in `/opt/rocm-<version>/lib`. These optional packages are typically named <library>-asan. However, the instrumented libraries themselves have identical names as the regular uninstrumented libraries and are located in `/opt/rocm-<version>/lib/asan`. It is expected that the subset of address sanitizer instrumented ROCm libraries will be expanded in future releases. They are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used, but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.

@@ -41,9 +41,9 @@ These additional libraries avoid additional developer effort to locate repositor

 When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.

-### Running Address Sanitizer Instrumented Applications
+## Running Address Sanitizer Instrumented Applications

-#### Preparing to Run an Instrumented Application
+### Preparing to Run an Instrumented Application

 Here are a few recommendations to consider before running an address sanitizer instrumented heterogeneous application.

@@ -76,13 +76,13 @@ This tells the ASAN runtime to halt the application immediately after detecting
 + `detect_leaks=0/1 default 1`.
 This option directs the address sanitizer runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.

-### Runtime Overhead
+## Runtime Overhead

 Running an address sanitizer instrumented application incurs
 overheads which may result in unacceptably long runtimes
 or failure to run at all.

-#### Higher Execution Time
+### Higher Execution Time

 Address sanitizer detection works by checking each address at runtime
 before the address is actually accessed by a load, store, or atomic
@@ -98,7 +98,7 @@ For heterogeneous applications, the shadow memory must be accessible by all devi
 and this can mean that shadow accesses from some devices may be more costly
 than non-shadow accesses.

-#### Higher Memory Use
+### Higher Memory Use

 The address checking described above relies on the compiler to surround
 each program variable with a red zone and on address sanitizer
@@ -111,7 +111,7 @@ Applications which consume most one or more available memory pools when
 run normally are likely to encounter allocation failures when run with
 instrumentation.

-### Runtime Reporting
+## Runtime Reporting

 It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the address sanitizer runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.

@@ -160,7 +160,7 @@ or

 currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.

-### Running with `rocgdb`
+## Running with `rocgdb`

 `rocgdb` can be used to further investigate address sanitizer detected errors, with some preparation.

@@ -212,9 +212,9 @@ $ rocgdb <path to application>
 (gdb) c
 ```

-### Using Address Sanitizer with a Short HIP Application (LINK NEEDED HERE)
+## Using Address Sanitizer with a Short HIP Application (LINK NEEDED HERE)

-### Known Issues with Using GPU Sanitizer
+## Known Issues with Using GPU Sanitizer

 + Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.

--- a/docs/conceptual/windows-app-deployment-guidelines.md
+++ b/docs/conceptual/windows-app-deployment-guidelines.md
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -35,57 +35,42 @@ article_pages = [
        "date":"2023-07-27"
    },

-    {"file":"deploy/linux/index", "os":["linux"]},
-    {"file":"deploy/linux/install_overview", "os":["linux"]},
-    {"file":"deploy/linux/prerequisites", "os":["linux"]},
-    {"file":"deploy/linux/quick_start", "os":["linux"]},
-    {"file":"deploy/linux/install", "os":["linux"]},
-    {"file":"deploy/linux/upgrade", "os":["linux"]},
-    {"file":"deploy/linux/uninstall", "os":["linux"]},
-    {"file":"deploy/linux/package_manager_integration", "os":["linux"]},
-    {"file":"deploy/docker", "os":["linux"]},
-    
-    {"file":"deploy/windows/cli/index", "os":["windows"]},
-    {"file":"deploy/windows/cli/install", "os":["windows"]},
-    {"file":"deploy/windows/cli/uninstall", "os":["windows"]},
-    {"file":"deploy/windows/cli/upgrade", "os":["windows"]},
-    {"file":"deploy/windows/gui/index", "os":["windows"]},
-    {"file":"deploy/windows/gui/install", "os":["windows"]},
-    {"file":"deploy/windows/gui/uninstall", "os":["windows"]},
-    {"file":"deploy/windows/gui/upgrade", "os":["windows"]},
-    {"file":"deploy/windows/index", "os":["windows"]},
-    {"file":"deploy/windows/prerequisites", "os":["windows"]},
-    {"file":"deploy/windows/quick_start", "os":["windows"]},
+    {"file":"tutorials/quick_start/windows", "os":["windows"]},
+    {"file":"tutorials/quick_start/linux", "os":["linux"]},

-    {"file":"release/gpu_os_support", "os":["linux"]},
-    {"file":"release/windows_support", "os":["windows"]},
-    {"file":"release/docker_support_matrix", "os":["linux"]},
-    
-    {"file":"reference/gpu_libraries/communication", "os":["linux"]},
-    {"file":"reference/ai_tools", "os":["linux"]},
-    {"file":"reference/management_tools", "os":["linux"]},
-    {"file":"reference/validation_tools", "os":["linux"]},
-    {"file":"reference/framework_compatibility/framework_compatibility", "os":["linux"]},
+    {"file":"tutorials/install/linux/index", "os":["linux"]},
+    {"file":"tutorials/install/linux/install_overview", "os":["linux"]},
+    {"file":"tutorials/install/linux/prerequisites", "os":["linux"]},
+
+    {"file":"tutorials/install/docker", "os":["linux"]},
+    {"file":"tutorials/install/magma_install", "os":["linux"]},
+    {"file":"tutorials/install/pytorch_install", "os":["linux"]},
+    {"file":"tutorials/install/tensorflow_install", "os":["linux"]},
+
+    {"file":"tutorials/install/windows/index", "os":["windows"]},
+    {"file":"tutorials/install/windows/prerequisites", "os":["windows"]},
+    {"file":"tutorials/install/windows/cli/index", "os":["windows"]},
+    {"file":"tutorials/install/windows/gui/index", "os":["windows"]},
+
+    {"file":"about/release/linux_support", "os":["linux"]},
+    {"file":"about/release/windows_support", "os":["windows"]},
+
+    {"file":"about/compatibility/docker_image_support_matrix", "os":["linux"]},
+
+    {"file":"reference/libraries/gpu_libraries/communication", "os":["linux"]},
+    {"file":"reference/compilers_tools/index", "os":["linux"]},
    {"file":"reference/computer_vision", "os":["linux"]},
-    
+
    {"file":"how_to/deep_learning_rocm", "os":["linux"]},
    {"file":"how_to/gpu_aware_mpi", "os":["linux"]},
-    {"file":"how_to/magma_install/magma_install", "os":["linux"]},
-    {"file":"how_to/pytorch_install/pytorch_install", "os":["linux"]},
    {"file":"how_to/system_debugging", "os":["linux"]},
-    {"file":"how_to/tensorflow_install/tensorflow_install", "os":["linux"]},

-    {"file":"examples/machine_learning", "os":["linux"]},
-    {"file":"examples/inception_casestudy/inception_casestudy", "os":["linux"]},
-    
-    {"file":"understand/file_reorg", "os":["linux"]},
-
-    {"file":"understand/isv_deployment_win", "os":["windows"]},
+    {"file":"rocm_ai/rocm_ai", "os":["linux"]},
 ]

 external_toc_path = "./sphinx/_toc.yml"

-docs_core = ROCmDocs("ROCm 5.6.1 Documentation Home")
+docs_core = ROCmDocs("ROCm Documentation")
 docs_core.setup()

 external_projects_current_project = "rocm"
--- a/docs/contribute/feedback.md
+++ b/docs/contribute/feedback.md
@@ -24,4 +24,4 @@ Issues on existing or absent docs can be filed as

 ## Email

-Send other feedback or questions to [rocm-feedback@amd.com](rocm-feedback@amd.com)
+Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback\@amd.com?subject=Documentation Feedback)
--- a/docs/contribute/index.md
+++ b/docs/contribute/index.md
@@ -0,0 +1,73 @@
+# Contributing to ROCm Docs
+
+AMD values and encourages the ROCm community to contribute to our code and
+documentation. This repository is focused on ROCm documentation and this
+contribution guide describes the recommended method for creating and modifying our
+documentation.
+
+While interacting with ROCm Documentation, we encourage you to be polite and
+respectful in your contributions, content or otherwise. Authors, maintainers of
+these docs act on good intentions and to the best of their knowledge.
+Keep that in mind while you engage. Should you have issues with contributing
+itself, refer to
+[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) on the
+GitHub repository.
+
+For additional information on documentation functionalities,
+see the user and developer guides for rocm-docs-core
+at {doc}`rocm-docs-core documentation <rocm-docs-core:index>`.
+
+## Supported Formats
+
+Our documentation includes both Markdown and RST files. Markdown is encouraged
+over RST due to the lower barrier to participation. GitHub-flavored Markdown is preferred
+for all submissions as it renders accurately on our GitHub repositories. For existing documentation,
+[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html) Markdown
+is used to implement certain features unsupported in GitHub Markdown. This is
+not encouraged for new documentation. AMD will transition
+to stricter use of GitHub-flavored Markdown with a few caveats. ROCm documentation
+also uses [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html)
+in our Markdown and RST files. We also use Breathe syntax for Doxygen documentation
+in our Markdown files. See
+[GitHub](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github)'s
+guide on writing and formatting on GitHub as a starting point.
+
+ROCm documentation adds additional requirements to Markdown and RST based files
+as follows:
+
+- Level one headers are only used for page titles. There must be only one level
+  1 header per file for both Markdown and Restructured Text.
+- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
+  our automated GitHub action on a Pull Request (PR).
+  See the {doc}`rocm-docs-core linting user guide <rocm-docs-core:user_guide/linting>` for more details.
+
+## Filenames and folder structure
+
+Please use snake case (all lower case letters and underscores instead of spaces)
+for file names. For example, `example_file_name.md`.
+Our documentation follows Pitchfork for folder structure.
+All documentation is in `/docs` except for special files like
+the contributing guide in the `/` folder. All images used in the documentation are
+placed in the `/docs/data` folder.
+
+## Language and Style
+
+Adopt Microsoft C++ docs guidelines for
+[Voice and tone](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
+
+ROCm documentation templates to be made public shortly. ROCm templates dictate
+the recommended structure and flow of the documentation. Guidelines on how to
+integrate figures, equations, and tables are all based off
+[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html).
+
+Font size and selection, page layout, white space control, and other formatting
+details are controlled via [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
+Raise issues in `rocm-docs-core` for any formatting concerns and changes requested.
+
+## More
+
+For more topics, such as submitting feedback and ways to build documentation,
+see the [Contributing Section](https://rocm.docs.amd.com/en/latest/contributing.html)
+at [rocm.docs.amd.com](https://rocm.docs.amd.com)
+
+To learn more about how our documentation is built, refer to the [ROCm toolchain](toolchain.md).
--- a/docs/contribute/toolchain.md
+++ b/docs/contribute/toolchain.md
@@ -1,9 +1,6 @@
-# About ROCm Documentation
+# ROCm documentation toolchain

-ROCm documentation is made available under open source [licenses](licensing.md).
-Documentation is built using open source toolchains. Contributions to our
-documentation is encouraged and welcome. As a contributor, please familiarize
-yourself with our documentation toolchain.
+Our documentation relies on several open source toolchains and sites.

 ## `rocm-docs-core`

@@ -17,7 +14,7 @@ See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core doc
 ## Sphinx

 [Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
-originally used for Python. It is now widely used in the Open Source community.
+originally used for Python. It is now widely used in the open source community.
 Originally, Sphinx supported reStructuredText (RST) based documentation, but
 Markdown support is now available.
 ROCm documentation plans to default to Markdown for new projects.
--- a/docs/data/understand/deep_learning/amd_logo.png
+++ b/docs/data/understand/deep_learning/amd_logo.png
--- a/docs/data/conceptual/gpu_arch/image001.png
+++ b/docs/data/conceptual/gpu_arch/image001.png
--- a/docs/data/conceptual/gpu_arch/image002.png
+++ b/docs/data/conceptual/gpu_arch/image002.png
--- a/docs/data/conceptual/gpu_arch/image003.png
+++ b/docs/data/conceptual/gpu_arch/image003.png
--- a/docs/data/conceptual/gpu_arch/image004.png
+++ b/docs/data/conceptual/gpu_arch/image004.png
--- a/docs/data/conceptual/gpu_arch/image005.png
+++ b/docs/data/conceptual/gpu_arch/image005.png
--- a/docs/data/conceptual/gpu_arch/image006.png
+++ b/docs/data/conceptual/gpu_arch/image006.png
--- a/docs/data/how_to/tuning_guides/image.010.png
+++ b/docs/data/how_to/tuning_guides/image.010.png
--- a/docs/data/how_to/tuning_guides/image.011.png
+++ b/docs/data/how_to/tuning_guides/image.011.png
--- a/docs/data/how_to/tuning_guides/image.012.png
+++ b/docs/data/how_to/tuning_guides/image.012.png
--- a/docs/data/how_to/tuning_guides/image.013.png
+++ b/docs/data/how_to/tuning_guides/image.013.png
--- a/docs/data/how_to/tuning_guides/image.014.png
+++ b/docs/data/how_to/tuning_guides/image.014.png
--- a/docs/data/how_to/tuning_guides/image.015.png
+++ b/docs/data/how_to/tuning_guides/image.015.png
--- a/docs/data/how_to/tuning_guides/image.016.png
+++ b/docs/data/how_to/tuning_guides/image.016.png
--- a/docs/data/how_to/tuning_guides/tuning001.png
+++ b/docs/data/how_to/tuning_guides/tuning001.png
--- a/docs/data/how_to/tuning_guides/tuning002.png
+++ b/docs/data/how_to/tuning_guides/tuning002.png
--- a/docs/data/how_to/tuning_guides/tuning003.png
+++ b/docs/data/how_to/tuning_guides/tuning003.png
--- a/docs/data/how_to/tuning_guides/tuning004.png
+++ b/docs/data/how_to/tuning_guides/tuning004.png
--- a/docs/data/how_to/tuning_guides/tuning005.png
+++ b/docs/data/how_to/tuning_guides/tuning005.png
--- a/docs/data/how_to/tuning_guides/tuning006.png
+++ b/docs/data/how_to/tuning_guides/tuning006.png
--- a/docs/data/how_to/tuning_guides/tuning008.png
+++ b/docs/data/how_to/tuning_guides/tuning008.png
--- a/docs/data/how_to/tuning_guides/tuning009.png
+++ b/docs/data/how_to/tuning_guides/tuning009.png
--- a/docs/data/understand/deep_learning/TextClassification_3.png
+++ b/docs/data/understand/deep_learning/TextClassification_3.png
--- a/docs/data/understand/deep_learning/TextClassification_4.png
+++ b/docs/data/understand/deep_learning/TextClassification_4.png
--- a/docs/data/understand/deep_learning/TextClassification_5.png
+++ b/docs/data/understand/deep_learning/TextClassification_5.png
--- a/docs/data/understand/deep_learning/TextClassification_6.png
+++ b/docs/data/understand/deep_learning/TextClassification_6.png
--- a/docs/data/understand/deep_learning/TextClassification_7.png
+++ b/docs/data/understand/deep_learning/TextClassification_7.png
--- a/docs/data/understand/deep_learning/image.018.png
+++ b/docs/data/understand/deep_learning/image.018.png
--- a/docs/data/understand/deep_learning/inception_v3.png
+++ b/docs/data/understand/deep_learning/inception_v3.png
--- a/docs/data/understand/deep_learning/mnist_1.png
+++ b/docs/data/understand/deep_learning/mnist_1.png
--- a/docs/data/understand/deep_learning/mnist_2.png
+++ b/docs/data/understand/deep_learning/mnist_2.png
--- a/docs/data/understand/deep_learning/mnist_3.png
+++ b/docs/data/understand/deep_learning/mnist_3.png
--- a/docs/data/understand/deep_learning/mnist_4.png
+++ b/docs/data/understand/deep_learning/mnist_4.png
--- a/docs/data/understand/deep_learning/mnist_5.png
+++ b/docs/data/understand/deep_learning/mnist_5.png
--- a/docs/data/tutorials/install/linux/linux001.png
+++ b/docs/data/tutorials/install/linux/linux001.png
--- a/docs/data/tutorials/install/linux/linux002.png
+++ b/docs/data/tutorials/install/linux/linux002.png
--- a/docs/data/tutorials/install/linux/linux003.png
+++ b/docs/data/tutorials/install/linux/linux003.png
--- a/docs/data/tutorials/install/linux/linux004.png
+++ b/docs/data/tutorials/install/linux/linux004.png
--- a/docs/data/tutorials/install/magma_install/magma005.png
+++ b/docs/data/tutorials/install/magma_install/magma005.png
--- a/docs/data/tutorials/install/magma_install/magma006.png
+++ b/docs/data/tutorials/install/magma_install/magma006.png
--- a/docs/data/tutorials/install/windows/000-settings-dark.png
+++ b/docs/data/tutorials/install/windows/000-settings-dark.png
--- a/docs/data/tutorials/install/windows/000-settings-light.png
+++ b/docs/data/tutorials/install/windows/000-settings-light.png
--- a/docs/data/tutorials/install/windows/000-setup-icon.png
+++ b/docs/data/tutorials/install/windows/000-setup-icon.png
--- a/docs/data/tutorials/install/windows/001-about-dark.png
+++ b/docs/data/tutorials/install/windows/001-about-dark.png
--- a/docs/data/tutorials/install/windows/001-about-light.png
+++ b/docs/data/tutorials/install/windows/001-about-light.png
--- a/docs/data/tutorials/install/windows/001-uac-dark.png
+++ b/docs/data/tutorials/install/windows/001-uac-dark.png
--- a/docs/data/tutorials/install/windows/001-uac-light.png
+++ b/docs/data/tutorials/install/windows/001-uac-light.png
--- a/docs/data/tutorials/install/windows/002-initializing.png
+++ b/docs/data/tutorials/install/windows/002-initializing.png
--- a/docs/data/tutorials/install/windows/003-detecting-system-config.png
+++ b/docs/data/tutorials/install/windows/003-detecting-system-config.png
--- a/docs/data/tutorials/install/windows/004-installer-window.png
+++ b/docs/data/tutorials/install/windows/004-installer-window.png
--- a/docs/data/tutorials/install/windows/012-install-progress.png
+++ b/docs/data/tutorials/install/windows/012-install-progress.png
--- a/docs/data/tutorials/install/windows/013-install-complete.png
+++ b/docs/data/tutorials/install/windows/013-install-complete.png
--- a/docs/data/tutorials/install/windows/014-uninstall-dark.png
+++ b/docs/data/tutorials/install/windows/014-uninstall-dark.png
--- a/docs/data/tutorials/install/windows/014-uninstall-light.png
+++ b/docs/data/tutorials/install/windows/014-uninstall-light.png
--- a/docs/data/unused_images/_005-deselect-all-windows.png
+++ b/docs/data/unused_images/_005-deselect-all-windows.png
--- a/docs/data/unused_images/_006-component-options-sdk-core-windows.png
+++ b/docs/data/unused_images/_006-component-options-sdk-core-windows.png
--- a/docs/data/unused_images/_007-component-options-libraries-windows.png
+++ b/docs/data/unused_images/_007-component-options-libraries-windows.png
--- a/docs/data/unused_images/_008-component-options-rtc-windows.png
+++ b/docs/data/unused_images/_008-component-options-rtc-windows.png
--- a/docs/data/unused_images/_009-component-options-rt-windows.png
+++ b/docs/data/unused_images/_009-component-options-rt-windows.png
--- a/docs/data/unused_images/_010-component-options-vs-plugin-windows.png
+++ b/docs/data/unused_images/_010-component-options-vs-plugin-windows.png
--- a/docs/data/unused_images/_011-component-options-radeon-software-windows.png
+++ b/docs/data/unused_images/_011-component-options-radeon-software-windows.png
--- a/docs/data/understand/deep_learning/Deep
+++ b/docs/data/understand/deep_learning/Deep
--- a/docs/data/understand/deep_learning/Install
+++ b/docs/data/understand/deep_learning/Install
--- a/docs/data/understand/deep_learning/Machine
+++ b/docs/data/understand/deep_learning/Machine
--- a/docs/data/understand/deep_learning/Matrix-1.png
+++ b/docs/data/understand/deep_learning/Matrix-1.png
--- a/docs/data/understand/deep_learning/Matrix-2.png
+++ b/docs/data/understand/deep_learning/Matrix-2.png
--- a/docs/data/understand/deep_learning/Matrix-3.png
+++ b/docs/data/understand/deep_learning/Matrix-3.png
--- a/docs/data/understand/deep_learning/Model
+++ b/docs/data/understand/deep_learning/Model
--- a/docs/data/understand/deep_learning/Pytorch
+++ b/docs/data/understand/deep_learning/Pytorch
--- a/docs/data/understand/deep_learning/Text
+++ b/docs/data/understand/deep_learning/Text
--- a/docs/data/understand/deep_learning/Text
+++ b/docs/data/understand/deep_learning/Text
--- a/Show More
+++ b/Show More