diff --git a/.markdownlint-cli2.yaml b/.markdownlint-cli2.yaml index ec34ab156..fa6edfd27 100644 --- a/.markdownlint-cli2.yaml +++ b/.markdownlint-cli2.yaml @@ -8,7 +8,9 @@ config: MD033: false MD034: false MD041: false + MD051: false ignores: - CHANGELOG.md - "{,docs/}{RELEASE,release}.md" + - docs/about/release/release_notes.md - tools/autotag/templates/**/*.md diff --git a/.wordlist.txt b/.wordlist.txt index 47dfc6d59..59904f9ae 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -1,3 +1,7 @@ +ROCProfiler +ROCTracer +ROCdbgapi +hipify # building matchers # file_reorg diff --git a/README.md b/README.md index 26db70c07..449808bf6 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,23 @@ # AMD ROCm™ Platform -ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for -graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development -tools, and APIs that enable GPU programming from low-level kernel to end-user applications. +ROCm is an open-source stack, composed primarily of open-source software, designed for graphics +processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and +APIs that enable GPU programming from low-level kernel to end-user applications. With ROCm, you can customize your GPU software to meet your specific needs. You can develop, -collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software +collaborate, test, and deploy your applications in a free, open source, integrated, and secure software ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC), artificial intelligence (AI), scientific computing, and computer aided design (CAD). ROCm is powered by AMD’s [Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP), -an OSS C++ GPU programming environment and its corresponding runtime. HIP allows ROCm -developers to create portable applications on different platforms by deploying code on a range of -platforms, from dedicated gaming GPUs to exascale HPC clusters. +an open-source software C++ GPU programming environment and its corresponding runtime. HIP +allows ROCm developers to create portable applications on different platforms by deploying code on a +range of platforms, from dedicated gaming GPUs to exascale HPC clusters. -ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary OSS -compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks, -such as PyTorch and TensorFlow. +ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open +source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning +(ML) frameworks, such as PyTorch and TensorFlow. ## ROCm Documentation @@ -47,4 +47,4 @@ python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html ## Older ROCm Releases For release information for older ROCm releases, refer to -[`CHANGELOG.md`](./CHANGELOG.md). +[`CHANGELOG`](./CHANGELOG). diff --git a/docs/release/3rd_party_support_matrix.md b/docs/about/compatibility/3rd_party_support_matrix.md similarity index 94% rename from docs/release/3rd_party_support_matrix.md rename to docs/about/compatibility/3rd_party_support_matrix.md index 66d6f92c3..f17de3c6c 100644 --- a/docs/release/3rd_party_support_matrix.md +++ b/docs/about/compatibility/3rd_party_support_matrix.md @@ -1,11 +1,9 @@ -# 3rd Party Support Matrix +# Third party support matrix ROCm™ supports various 3rd party libraries and frameworks. Supported versions are tested and known to work. Non-supported versions of 3rd parties may also work, but aren't tested. -(ml_framework_compat_matrix)= - ## Deep Learning ROCm releases support the most recent and two prior releases of PyTorch and @@ -21,6 +19,8 @@ TensorFlow | 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11 | 2.5.4 | | 5.6 | 1.11, 1.12.1, 1.13.1 | 2.12 | 2.5.4 | +(communication_libraries)= + ## Communication libraries ROCm supports [OpenUCX](https://openucx.org/) an "an open-source, @@ -59,4 +59,4 @@ contemporary CUDA / NVIDIA HPC SDK alternatives. | 5.6 | 1.17.2 | 22.9 | For the latest documentation of these libraries, refer to the -[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md). +[associated documentation](../../reference/libraries/gpu_libraries/c++_primitives). diff --git a/docs/release/docker_image_support_matrix.md b/docs/about/compatibility/docker_image_support_matrix.md similarity index 100% rename from docs/release/docker_image_support_matrix.md rename to docs/about/compatibility/docker_image_support_matrix.md diff --git a/docs/release/compatibility.md b/docs/about/compatibility/index.md similarity index 80% rename from docs/release/compatibility.md rename to docs/about/compatibility/index.md index 00ce4fdeb..9d0c47a60 100644 --- a/docs/release/compatibility.md +++ b/docs/about/compatibility/index.md @@ -7,14 +7,14 @@ Forward and backward compatibility of ROCm user space components and the kernel space Kernel Fusion Driver (KFD). -- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix.md) +- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix) ::: :::{grid-item-card} Docker Image Support ROCm releases several Docker container images. -- [Docker Image Support Matrix](./docker_image_support_matrix.md) +- [Docker Image Support Matrix](./docker_image_support_matrix) ::: @@ -22,7 +22,7 @@ ROCm releases several Docker container images. Several 3rd party libraries ship with ROCm enablement as well as several ROCm components provide interfaces compatible with 3rd party solutions. -- [3rd Party Support Matrix](./3rd_party_support_matrix.md) +- [Third party support matrix](./3rd_party_support_matrix) ::: diff --git a/docs/release/user_kernel_space_compat_matrix.md b/docs/about/compatibility/user_kernel_space_compat_matrix.md similarity index 100% rename from docs/release/user_kernel_space_compat_matrix.md rename to docs/about/compatibility/user_kernel_space_compat_matrix.md diff --git a/docs/about/license.md b/docs/about/license.md new file mode 100644 index 000000000..847e6d251 --- /dev/null +++ b/docs/about/license.md @@ -0,0 +1,9 @@ +# License + +> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that primarily contains documentation. For other licensing information, refer to the [Licensing Terms page](./licensing). + +```{include} ../../LICENSE +``` + +```{include} ./licensing.md +``` diff --git a/docs/release/licensing.md b/docs/about/licensing.md similarity index 99% rename from docs/release/licensing.md rename to docs/about/licensing.md index a8a2d51df..56ceefa8d 100644 --- a/docs/release/licensing.md +++ b/docs/about/licensing.md @@ -1,4 +1,4 @@ -# Licensing Terms +# ROCm licensing terms ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately. The following table is a list of ROCm components with links to their respective license diff --git a/docs/release/gpu_os_support.md b/docs/about/release/linux_support.md similarity index 98% rename from docs/release/gpu_os_support.md rename to docs/about/release/linux_support.md index 24f2d49a3..3aaf94adb 100644 --- a/docs/release/gpu_os_support.md +++ b/docs/about/release/linux_support.md @@ -1,6 +1,6 @@ -# GPU Support and OS Compatibility (Linux) +# GPU and OS support (Linux) -(supported_distributions)= +(linux_support)= ## Supported Linux Distributions diff --git a/docs/release/versions.md b/docs/about/release/release_history.md similarity index 100% rename from docs/release/versions.md rename to docs/about/release/release_history.md diff --git a/docs/about/release/release_notes.md b/docs/about/release/release_notes.md new file mode 100644 index 000000000..0830b04d1 --- /dev/null +++ b/docs/about/release/release_notes.md @@ -0,0 +1,583 @@ +# Release Notes + + + + + + + + + + + + + +The release notes for the ROCm platform. + +------------------- + +## ROCm 5.6.0 + + + +#### Release Highlights + +ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include: + +- New documentation portal at https://rocm.docs.amd.com +- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite +- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements +- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers +- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER. + +#### OS and GPU Support Changes + +- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped. +- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date. + - No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7 + - Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release) + - Bug fixes during the maintenance will be made to the next ROCm point release + - Bug fixes will not be back ported to older ROCm releases for this SKU + - Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM. + +#### AMDSMI CLI 23.0.0.4 + +##### Added + +- AMDSMI CLI tool enabled for Linux Bare Metal & Guest + +- Package: amd-smi-lib + +##### Known Issues + +- not all Error Correction Code (ECC) fields are currently supported + +- RHEL 8 & SLES 15 have extra install steps + +#### Kernel Modules (DKMS) + +##### Fixes + +- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198). + +#### HIP 5.6 (For ROCm 5.6) + +##### Optimizations + +- Consolidation of hipamd, rocclr and OpenCL projects in clr +- Optimized lock for graph global capture mode + +##### Added + +- Added hipRTC support for amd_hip_fp16 +- Added hipStreamGetDevice implementation to get the device associated with the stream +- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats +- hipArrayGetInfo for getting information about the specified array +- hipArrayGetDescriptor for getting 1D or 2D array descriptor +- hipArray3DGetDescriptor to get 3D array descriptor + +##### Changed + +- hipMallocAsync to return success for zero size allocation to match hipMalloc +- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package +- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide +- Removed hipBusBandwidth and hipCommander samples from hip-tests + +##### Fixed + +- Fixed regression in hipMemCpyParam3D when offset is applied + +##### Known Issues + +- Limited testing on xnack+ configuration + - Multiple HIP tests failures (gpuvm fault or hangs) +- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU +- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release + +##### Upcoming changes in future release + +- Removal of gcnarch from hipDeviceProp_t structure +- Addition of new fields in hipDeviceProp_t structure + - maxTexture1D + - maxTexture2D + - maxTexture1DLayered + - maxTexture2DLayered + - sharedMemPerMultiprocessor + - deviceOverlap + - asyncEngineCount + - surfaceAlignment + - unifiedAddressing + - computePreemptionSupported + - uuid +- Removal of deprecated code + - hip-hcc codes from hip code tree +- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA +- HIPMEMCPY_3D fields correction (unsigned int -> size_t) +- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type' + +#### ROCgdb-13 (For ROCm 5.6.0) + +##### Optimized + +- Improved performances when handling the end of a process with a large number of threads. + +Known Issues + +- On certain configurations, ROCgdb can show the following warning message: + + `warning: Probes-based dynamic linker interface failed. Reverting to original interface.` + + This does not affect ROCgdb's functionalities. + +#### ROCprofiler (For ROCm 5.6.0) + +In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of +ROCm 5.5 are split into separate files. The `rocmtools` files that were +deprecated in ROCm 5.5 have been removed. + + | ROCm 5.6 | rocprofilerv1 | rocprofilerv2 | + |-----------------|-------------------------------------|----------------------------------------| + | **Tool script** | `bin/rocprof` | `bin/rocprofv2` | + | **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` | + | **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` | + +The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the +following command: + +```sh +$ rocprof … +``` + +To write a custom tool based on the `rocprofilerV1` API do the following: + +```C +main.c: +#include // Use the rocprofilerV1 API +int main() { + // Use the rocprofilerV1 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`. + +The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the +following command: + +```sh +$ rocprofv2 … +``` + +To write a custom tool based on the `rocprofilerV2` API do the following: + +```C +main.c: +#include // Use the rocprofilerV2 API +int main() { + // Use the rocprofilerV2 API + return 0; +} +``` + +This can be built in the following manner: + +```sh +$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2 +``` + +The resulting `a.out` will depend on +`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`. + +##### Optimized + +- Improved Test Suite + +##### Added + +- 'end_time' need to be disabled in roctx_trace.txt + +##### Fixed + +- rocprof in ROcm/5.4.0 gpu selector broken. +- rocprof in ROCm/5.4.1 fails to generate kernel info. +- rocprof clobbers LD_PRELOAD. + +### Library Changes in ROCM 5.6.0 + +| Library | Version | +|---------|---------| +| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) | +| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) | +| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) | +| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) | +| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) | +| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) | +| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) | +| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) | +| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) | +| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) | +| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) | +| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) | +| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) | +| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) | +| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) | +| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) | +| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) | +| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) | + +#### hipBLAS 1.0.0 + +hipBLAS 1.0.0 for ROCm 5.6.0 + +##### Changed + +- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing + +##### Removed + +- removed support for deprecated hipblasInt8Datatype_t enum +- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions + +##### Deprecated + +- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and + out-of-place functionality + +#### hipCUB 2.13.1 + +hipCUB 2.13.1 for ROCm 5.6.0 + +##### Added + +- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`. + +##### Changed + +- CUB backend references CUB and Thrust version 1.17.2. +- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`. +- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). + +##### Known Issues + +- `BlockRadixRankMatch` is currently broken under the rocPRIM backend. +- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend. + +#### hipFFT 1.0.12 + +hipFFT 1.0.12 for ROCm 5.6.0 + +##### Added + +- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms. + +##### Changed + +- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform. + +#### hipSOLVER 1.8.0 + +hipSOLVER 1.8.0 for ROCm 5.6.0 + +##### Added + +- Added compatibility API with hipsolverRf prefix + +#### hipSPARSE 2.3.6 + +hipSPARSE 2.3.6 for ROCm 5.6.0 + +##### Added + +- Added SpGEMM algorithms + +##### Changed + +- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE + +#### MIOpen 2.19.0 + +MIOpen 2.19.0 for ROCm 5.6.0 + +##### Added + +- ROCm 5.5 support for gfx1101 (Navi32) + +##### Changed + +- Tuning results for MLIR on ROCm 5.5 +- Bumping MLIR commit to 5.5.0 release tag + +##### Fixed + +- Fix 3d convolution Host API bug +- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required. + +#### rccl 2.15.5 + +RCCL 2.15.5 for ROCm 5.6.0 + +##### Changed + +- Compatibility with NCCL 2.15.5 +- Unit test executable renamed to rccl-UnitTests + +##### Added + +- HW-topology aware binary tree implementation +- Experimental support for MSCCL +- New unit tests for hipGraph support +- NPKit integration + +##### Fixed + +- rocm-smi ID conversion +- Support for HIP_VISIBLE_DEVICES for unit tests +- Support for p2p transfers to non (HIP) visible devices + +##### Removed + +- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench + +#### rocALUTION 2.1.9 + +rocALUTION 2.1.9 for ROCm 5.6.0 + +##### Improved + +- Fixed synchronization issues in level 1 routines + +#### rocBLAS 3.0.0 + +rocBLAS 3.0.0 for ROCm 5.6.0 + +##### Optimizations + +- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256. +- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a. + +##### Added + +- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex. + +##### Deprecated + +- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality +- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release +- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release +- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size() +- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release + +##### Removed + +- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead. +- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively. +- rocblas_set_int8_type_for_hipblas was deprecated and is now removed. +- rocblas_get_int8_type_for_hipblas was deprecated and is now removed. + +##### Dependencies + +- build only dependency on python joblib added as used by Tensile build +- fix for cmake install on some OS when performed by install.sh -d --cmake_install + +##### Fixed + +- make trsm offset calculations 64 bit safe + +##### Changed + +- refactor rotg test code + +#### rocFFT 1.0.23 + +rocFFT 1.0.23 for ROCm 5.6.0 + +##### Added + +- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create. +- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used. +- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems. + +##### Changed + +- Replaced std::complex with hipComplex data types for data generator. +- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example). +- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform. + +##### Fixed + +- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure. + +#### rocm-cmake 0.9.0 + +rocm-cmake 0.9.0 for ROCm 5.6.0 + +##### Added + +- Added the option ROCM_HEADER_WRAPPER_WERROR + - Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings. + - Configure-time CMake option sets the default for the C macro. + +#### rocPRIM 2.13.0 + +rocPRIM 2.13.0 for ROCm 5.6.0 + +##### Added + +- New block level `radix_rank` primitive. +- New block level `radix_rank_match` primitive. +- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm. + +##### Changed + +- Improved the performance of `block_radix_sort` and `device_radix_sort`. +- Improved the performance of `device_merge_sort`. +- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ). + +##### Known Issues + +- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows. +- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits<unsigned int>::max()`. + +#### rocRAND 2.10.17 + +rocRAND 2.10.17 for ROCm 5.6.0 + +##### Added + +- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. +- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`. +- experimental HIP-CPU feature +- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3". + +##### Changed + +- Python 2.7 is no longer officially supported. + +#### rocSOLVER 3.22.0 + +rocSOLVER 3.22.0 for ROCm 5.6.0 + +##### Added + +- LU refactorization for sparse matrices + - CSRRF_ANALYSIS + - CSRRF_SUMLU + - CSRRF_SPLITLU + - CSRRF_REFACTLU +- Linear system solver for sparse matrices + - CSRRF_SOLVE +- Added type `rocsolver_rfinfo` for use with sparse matrix routines + +##### Optimized + +- Improved the performance of BDSQR and GESVD when singular vectors are requested + +##### Fixed + +- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf` + +#### rocSPARSE 2.5.2 + +rocSPARSE 2.5.2 for ROCm 5.6.0 + +##### Improved + +- Fixed a memory leak in csritsv +- Fixed a bug in csrsm and bsrsm + +#### rocThrust 2.18.0 + +rocThrust 2.18.0 for ROCm 5.6.0 + +##### Fixed + +- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types. + +##### Changed + +- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). + +#### rocWMMA 1.1.0 + +rocWMMA 1.1.0 for ROCm 5.6.0 + +##### Added + +- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp) +- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation) +- Added performance gemm samples for half, single and double precision +- Added rocWMMA cmake versioning +- Added vectorized support in coordinate transforms +- Included ROCm smi for runtime clock rate detection +- Added fragment transforms for transpose and change data layout + +##### Changed + +- Default to GPU rocBLAS validation against rocWMMA +- Re-enabled int8 gemm tests on gfx9 +- Upgraded to C++17 +- Restructured unit test folder for consistency +- Consolidated rocWMMA samples common code + +#### Tensile 4.37.0 + +Tensile 4.37.0 for ROCm 5.6.0 + +##### Added + +- Added user driven tuning API +- Added decision tree fallback feature +- Added SingleBuffer + AtomicAdd option for GlobalSplitU +- DirectToVgpr support for fp16 and Int8 with TN orientation +- Added new test cases for various functions +- Added SingleBuffer algorithm for ZGEMM/CGEMM +- Added joblib for parallel map calls +- Added support for MFMA + LocalSplitU + DirectToVgprA+B +- Added asmcap check for MIArchVgpr +- Added support for MFMA + LocalSplitU +- Added frequency, power, and temperature data to the output + +##### Optimizations + +- Improved the performance of GlobalSplitU with SingleBuffer algorithm +- Reduced the running time of the extended and pre_checkin tests +- Optimized the Tailloop section of the assembly kernel +- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch) +- Improved the performance of the second kernel of MultipleBuffer algorithm + +##### Changed + +- Updated custom kernels with 64-bit offsets +- Adapted 64-bit offset arguments for assembly kernels +- Improved temporary register re-use to reduce max sgpr usage +- Removed some restrictions on VectorWidth and DirectToVgpr +- Updated the dependency requirements for Tensile +- Changed the range of AssertSummationElementMultiple +- Modified the error messages for more clarity +- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used +- Removed dummy vgpr for vectorStaticRemainder +- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder +- Removed qReg parameter from vectorStaticRemainder + +##### Fixed + +- Fixed tmp sgpr allocation to avoid over-writing values (alpha) +- 64-bit offset parameters for post kernels +- Fixed gfx908 CI test failures +- Fixed offset calculation to prevent overflow for large offsets +- Fixed issues when BufferLoad and BufferStore are equal to zero +- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch +- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch +- Fixed the memory access error related to StaggerU + large stride +- Fixed ZGEMM 4x4 MatrixInst mismatch +- Fixed DGEMM 4x4 MatrixInst mismatch +- Fixed ASEM + GSU + NoTailLoop opt mismatch +- Fixed AssertSummationElementMultiple + GlobalSplitU issues +- Fixed ASEM + GSU + TailLoop inner unroll diff --git a/docs/release/windows_support.md b/docs/about/release/windows_support.md similarity index 86% rename from docs/release/windows_support.md rename to docs/about/release/windows_support.md index 25b13e167..8288b6d6b 100644 --- a/docs/release/windows_support.md +++ b/docs/about/release/windows_support.md @@ -59,15 +59,12 @@ on this table, the GPU is not officially supported by AMD. ### Component Support -ROCm components are described in the [reference](../reference/all) page. Support +ROCm components are described in the [Reference material](../../reference/index). Support on Windows is provided with two levels on enablement. -- **Runtime**: Runtime enables the use of the HIP/OpenCL runtimes only. -- **HIP SDK**: Runtime plus additional components refer to libraries found under - [Math Libraries](../reference/gpu_libraries/math.md) and - [C++ Primitive Libraries](../reference/gpu_libraries/c%2B%2B_primitives.md). - Some [Math Libraries](../reference/gpu_libraries/math.md) are Linux exclusive, - please check the library details. +- **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only. +- **HIP SDK**: Runtime plus additional components refer to [Libraries](../../reference/libraries/index). + Some [math libraries](../../reference/libraries/gpu_libraries/math) are Linux exclusive, please check the library details. ### Support Status diff --git a/docs/understand/More-about-how-ROCm-uses-PCIe-Atomics.rst b/docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst similarity index 100% rename from docs/understand/More-about-how-ROCm-uses-PCIe-Atomics.rst rename to docs/conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst diff --git a/docs/understand/cmake_packages.rst b/docs/conceptual/cmake_packages.rst similarity index 99% rename from docs/understand/cmake_packages.rst rename to docs/conceptual/cmake_packages.rst index cf5222d8b..1dce30b34 100644 --- a/docs/understand/cmake_packages.rst +++ b/docs/conceptual/cmake_packages.rst @@ -50,7 +50,7 @@ the *config-file* packages are shipped with the upstream projects, such as rocPRIM and other ROCm libraries. For a complete guide on where and how ROCm may be installed on a system, refer -to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_). +to the installation guides in these docs (`Linux <../tutorials/install/index.html>`_). Using HIP in CMake ================== diff --git a/docs/understand/compiler_disambiguation.md b/docs/conceptual/compiler_disambiguation.md similarity index 100% rename from docs/understand/compiler_disambiguation.md rename to docs/conceptual/compiler_disambiguation.md diff --git a/docs/understand/file_reorg.md b/docs/conceptual/file_reorg.md similarity index 100% rename from docs/understand/file_reorg.md rename to docs/conceptual/file_reorg.md diff --git a/docs/understand/gpu_arch.md b/docs/conceptual/gpu_arch.md similarity index 100% rename from docs/understand/gpu_arch.md rename to docs/conceptual/gpu_arch.md diff --git a/docs/understand/gpu_arch/mi100.md b/docs/conceptual/gpu_arch/mi100.md similarity index 84% rename from docs/understand/gpu_arch/mi100.md rename to docs/conceptual/gpu_arch/mi100.md index b38f751ee..ddbfabb37 100644 --- a/docs/understand/gpu_arch/mi100.md +++ b/docs/conceptual/gpu_arch/mi100.md @@ -6,7 +6,7 @@ these GPUs. ## System Architecture -{numref}`mi100-arch` shows the node-level architecture of a system that +The following image shows the node-level architecture of a system that comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators. The two EPYC processors are connected to each other with the AMD Infinity™ fabric which provides a high-bandwidth (up to 18 GT/sec) and coherent links such @@ -17,12 +17,13 @@ available to connect the processors plus one PCIe Gen 4 x16 link per processor can attach additional I/O devices such as the host adapters for the network fabric. -:::{figure-md} mi100-arch - -Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators. +```{figure} ../../data/conceptual/gpu_arch/image004.png +:name: mi100-arch +:alt: Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators. +:align: center Structure of a single GCD in the AMD Instinct MI100 accelerator. -::: +``` In a typical node configuration, each processor can host up to four AMD Instinct™ accelerators that are attached using PCIe Gen 4 links at 16 GT/sec, @@ -42,18 +43,19 @@ computing (HPC) and AI & machine learning (ML) that run on everything from individual servers to the world's largest exascale supercomputers. The overall system architecture is designed for extreme scalability and compute performance. -:::{figure-md} mi100-block - -Structure of the AMD Instinct accelerator (MI100 generation). +```{figure} ../../data/conceptual/gpu_arch/image005.png +:name: mi100-block +:alt: Structure of the AMD Instinct accelerator (MI100 generation). +:align: center Structure of the AMD Instinct accelerator (MI100 generation). -::: +``` -{numref}`mi100-block` shows the AMD Instinct accelerator with its PCIe Gen 4 x16 +The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16 link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host processor(s). It also shows the three AMD Infinity Fabric ports that provide high-speed links (23 GT/sec, also at the bottom) to the other GPUs of the local -hive as shown in {numref}`mi100-arch`. +hive. On the left and right of the floor plan, the High Bandwidth Memory (HBM) attaches via the GPU's memory controller. The MI100 generation of the AMD @@ -61,7 +63,7 @@ Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total of 32GB with a 4,096bit-wide memory interface. The peak memory bandwidth of the attached HBM2 is 1.228 TB/sec at a memory clock frequency of 1.2 GHz. -The execution units of the GPU are depicted in {numref}`mi100-block` as Compute +The execution units of the GPU are depicted in the above image as Compute Units (CU). There are a total 120 compute units that are physically organized into eight Shader Engines (SE) with fifteen compute units per shader engine. Each compute unit is further sub-divided into four SIMD units that process SIMD @@ -70,15 +72,16 @@ instructions of 16 data elements per instruction. This enables the CU to process Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS (`4 [SIMD units] x 16 [elements per instruction] x 120 [CU] x 1.5 [GHz]`). -:::{figure-md} mi100-gcd - -Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture +```{figure} ../../data/conceptual/gpu_arch/image006.png +:name: mi100-gcd +:alt: Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture. +:align: center Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA -architecture -::: +architecture. +``` -{numref}`mi100-gcd` shows the block diagram of a single CU of an AMD Instinct™ +The preceding image shows the block diagram of a single CU of an AMD Instinct™ MI100 accelerator and summarizes how instructions flow through the execution engines. The CU fetches the instructions via a 32KB instruction cache and moves them forward to execution via a dispatcher. The CU can handle up to ten diff --git a/docs/understand/gpu_arch/mi200_performance_counters.md b/docs/conceptual/gpu_arch/mi200_performance_counters.md similarity index 100% rename from docs/understand/gpu_arch/mi200_performance_counters.md rename to docs/conceptual/gpu_arch/mi200_performance_counters.md diff --git a/docs/understand/gpu_arch/mi250.md b/docs/conceptual/gpu_arch/mi250.md similarity index 82% rename from docs/understand/gpu_arch/mi250.md rename to docs/conceptual/gpu_arch/mi250.md index e0f0a9617..7f1bbeed4 100644 --- a/docs/understand/gpu_arch/mi250.md +++ b/docs/conceptual/gpu_arch/mi250.md @@ -12,8 +12,7 @@ everything from individual servers to the world’s largest exascale supercomputers. The overall system architecture is designed for extreme scalability and compute performance. -{numref}`mi250-gcd` shows the components of a single Graphics Compute Die (GCD -) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™ +The following image shows the components of a single Graphics Compute Die (GCD) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™ interfaces and their physical links that are used to connect the GPU die to the other system-level components of the node (see also Section 2.2). Both interfaces can drive four AMD Infinity Fabric links. One of the AMD Infinity @@ -28,7 +27,7 @@ To the left and the right are memory controllers that attach the High Bandwidth Memory (HBM) modules to the GCD. AMD Instinct MI250 GPUs use HBM2e, which offers a peak memory bandwidth of 1.6 TB/sec per GCD. -The execution units of the GPU are depicted in {numref}`mi250-gcd` as Compute +The execution units of the GPU are depicted in the following image as Compute Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further subdivided into four SIMD units that process SIMD instructions of 16 data elements per instruction (for the FP64 data type). This enables the CU to @@ -39,16 +38,17 @@ execution units (also called matrix cores), which are geared toward executing matrix operations like matrix-matrix multiplications. For FP64, the peak performance of these units amounts to 90.5 TFLOPS. -:::{figure-md} mi250-gcd +```{figure} ../../data/conceptual/gpu_arch/image001.png +:name: mi250-gcd +:alt: Structure of a single GCD in the AMD Instinct MI250 accelerator. +:align: center -Structure of a single GCD in the AMD Instinct MI250 accelerator. - -Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator. -::: +Structure of a single GCD in the AMD Instinct MI250 accelerator. +``` ```{list-table} Peak-performance capabilities of the MI250 OAM for different data types. :header-rows: 1 -:name: mi250-perf +:name: mi250-perf-table * - Computation and Data Type @@ -88,7 +88,7 @@ Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator. - 362.1 ``` -{numref}`mi250-perf` summarizes the aggregated peak performance of the AMD +The above table summarizes the aggregated peak performance of the AMD Instinct MI250 OCP Open Accelerator Modules (OAM, OCP is short for Open Compute Platform) and its two GCDs for different data types and execution units. The middle column lists the peak performance (number of data elements processed in a @@ -97,14 +97,15 @@ is being retired in each clock cycle. The third column lists the theoretical peak performance of the OAM module. The theoretical aggregated peak memory bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD). -:::{figure-md} mi250-arch - -Dual-GCD architecture of the AMD Instinct MI250 accelerators. +```{figure} ../../data/conceptual/gpu_arch/image002.png +:name: mi250-perf +:alt: Dual-GCD architecture of the AMD Instinct MI250 accelerators.. +:align: center Dual-GCD architecture of the AMD Instinct MI250 accelerators. -::: +``` -{numref}`mi250-arch` shows the block diagram of an OAM package that consists +The following image shows the block diagram of an OAM package that consists of two GCDs, each of which constitutes one GPU device in the system. The two GCDs in the package are connected via four AMD Infinity Fabric links running at a theoretical peak rate of 25 GT/sec, giving 200 GB/sec peak transfer bandwidth @@ -113,7 +114,7 @@ between the two GCDs of an OAM, or a bidirectional peak transfer bandwidth of ## Node-level Architecture -{numref}`mi250-block` shows the node-level architecture of a system that is +The following image shows the node-level architecture of a system that is based on the AMD Instinct MI250 accelerator. The MI250 OAMs attach to the host system via PCIe Gen 4 x16 links (yellow lines). Each GCD maintains its own PCIe x16 link to the host part of the system. Depending on the server platform, the @@ -121,15 +122,16 @@ GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch . Note that some platforms may offer an x8 interface to the GCDs, which reduces the available host-to-GPU bandwidth. -:::{figure-md} mi250-block - -Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor. +```{figure} ../../data/conceptual/gpu_arch/image003.png +:name: mi250-block +:alt: Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor. +:align: center Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor. -::: +``` -{numref}`mi250-block` shows the node-level architecture of a system with AMD +The preceding image shows the node-level architecture of a system with AMD EPYC processors in a dual-socket configuration and four AMD Instinct MI250 accelerators. The MI250 OAMs attach to the host processors system via PCIe Gen 4 x16 links (yellow lines). Depending on the system design, a PCIe switch may @@ -146,4 +148,4 @@ two GPU dies in the MI250 OAM and operates at 25 GT/sec, which corresponds to a theoretical peak transfer rate of 50 GB/sec per link (or 100 GB/sec bidirectional peak transfer bandwidth). The GCD pairs 2 and 6 as well as GCDs 0 and 4 connect via two XGMI links, which is indicated by the thicker red line in -{numref}`mi250-block`. +the preceding image. diff --git a/docs/understand/gpu_isolation.md b/docs/conceptual/gpu_isolation.md similarity index 100% rename from docs/understand/gpu_isolation.md rename to docs/conceptual/gpu_isolation.md diff --git a/docs/understand/all.md b/docs/conceptual/index.md similarity index 87% rename from docs/understand/all.md rename to docs/conceptual/index.md index 17042e20c..fdd124464 100644 --- a/docs/understand/all.md +++ b/docs/conceptual/index.md @@ -1,10 +1,10 @@ -# All Explanation Material +# Conceptual documentation :::::{grid} 1 1 2 2 :gutter: 1 :::{grid-item-card} Compiler Nomencalture -:link: compiler_disambiguation +:link: ./compiler_disambiguation :link-type: doc ROCm ships multiple compilers of varying origins and purposes. This article disambiguates compiler naming used throughout the documentation. @@ -12,7 +12,7 @@ disambiguates compiler naming used throughout the documentation. ::: :::{grid-item-card} Using CMake -:link: cmake_packages +:link: ./cmake_packages :link-type: doc ROCm components ship with 1st party CMake support. This article details how that support works and how to use it. @@ -20,7 +20,7 @@ support works and how to use it. ::: :::{grid-item-card} Linux Folder Structure Reorganization -:link: file_reorg +:link: ./file_reorg :link-type: doc ROCm™ packages have adopted the Linux foundation file system hierarchy standard to ensure ROCm components follow open source conventions for Linux-based @@ -29,7 +29,7 @@ distributions. ::: :::{grid-item-card} GPU Isolation Techniques -:link: gpu_isolation +:link: ./gpu_isolation :link-type: doc Restricting the access of applications to a subset of GPUs, aka isolating GPUs allows users to hide GPU resources from programs. @@ -37,7 +37,7 @@ allows users to hide GPU resources from programs. ::: :::{grid-item-card} GPU Architectures -:link: gpu_arch +:link: ./gpu_arch :link-type: doc AMD documentation around architectural details from both the CDNA and RDNA product lines. diff --git a/docs/understand/using_gpu_sanitizer.md b/docs/conceptual/using_gpu_sanitizer.md similarity index 96% rename from docs/understand/using_gpu_sanitizer.md rename to docs/conceptual/using_gpu_sanitizer.md index 2d62ea4cc..83a74640f 100644 --- a/docs/understand/using_gpu_sanitizer.md +++ b/docs/conceptual/using_gpu_sanitizer.md @@ -1,4 +1,4 @@ -### Using the LLVM Address Sanitizer (ASAN) on the GPU +# Using the LLVM Address Sanitizer (ASAN) on the GPU The LLVM Address Sanitizer provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement. @@ -7,7 +7,7 @@ Until now, the LLVM Address Sanitizer process was only available for traditional This document provides documentation on using ROCm Address Sanitizer. For information about LLVM Address Sanitizer, see [the LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html). -### Compile for Address Sanitizer +## Compile for Address Sanitizer The address sanitizer process begins by compiling the application of interest with the address sanitizer instrumentation. @@ -23,7 +23,7 @@ Other architectures are allowed, but their device code will not be instrumented It is not an error to compile some files without address sanitizer instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the Address Sanitizer runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section. -#### About Compilation Time +### About Compilation Time When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level. @@ -33,7 +33,7 @@ There are a few options if the compile time becomes unacceptable: + Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information. + Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process. -### Use AMD Supplied Address Sanitizer Instrumented Libraries +## Use AMD Supplied Address Sanitizer Instrumented Libraries ROCm releases provide optional packages containing address sanitizer instrumented builds of a subset of those ROCm libraries usually found in `/opt/rocm-/lib`. These optional packages are typically named -asan. However, the instrumented libraries themselves have identical names as the regular uninstrumented libraries and are located in `/opt/rocm-/lib/asan`. It is expected that the subset of address sanitizer instrumented ROCm libraries will be expanded in future releases. They are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used, but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`. @@ -41,9 +41,9 @@ These additional libraries avoid additional developer effort to locate repositor When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported. -### Running Address Sanitizer Instrumented Applications +## Running Address Sanitizer Instrumented Applications -#### Preparing to Run an Instrumented Application +### Preparing to Run an Instrumented Application Here are a few recommendations to consider before running an address sanitizer instrumented heterogeneous application. @@ -76,13 +76,13 @@ This tells the ASAN runtime to halt the application immediately after detecting + `detect_leaks=0/1 default 1`. This option directs the address sanitizer runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`. -### Runtime Overhead +## Runtime Overhead Running an address sanitizer instrumented application incurs overheads which may result in unacceptably long runtimes or failure to run at all. -#### Higher Execution Time +### Higher Execution Time Address sanitizer detection works by checking each address at runtime before the address is actually accessed by a load, store, or atomic @@ -98,7 +98,7 @@ For heterogeneous applications, the shadow memory must be accessible by all devi and this can mean that shadow accesses from some devices may be more costly than non-shadow accesses. -#### Higher Memory Use +### Higher Memory Use The address checking described above relies on the compiler to surround each program variable with a red zone and on address sanitizer @@ -111,7 +111,7 @@ Applications which consume most one or more available memory pools when run normally are likely to encounter allocation failures when run with instrumentation. -### Runtime Reporting +## Runtime Reporting It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the address sanitizer runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues. @@ -160,7 +160,7 @@ or currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored. -### Running with `rocgdb` +## Running with `rocgdb` `rocgdb` can be used to further investigate address sanitizer detected errors, with some preparation. @@ -212,9 +212,9 @@ $ rocgdb (gdb) c ``` -### Using Address Sanitizer with a Short HIP Application (LINK NEEDED HERE) +## Using Address Sanitizer with a Short HIP Application (LINK NEEDED HERE) -### Known Issues with Using GPU Sanitizer +## Known Issues with Using GPU Sanitizer + Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected. diff --git a/docs/understand/windows-app-deployment-guidelines.md b/docs/conceptual/windows-app-deployment-guidelines.md similarity index 100% rename from docs/understand/windows-app-deployment-guidelines.md rename to docs/conceptual/windows-app-deployment-guidelines.md diff --git a/docs/conf.py b/docs/conf.py index 2c786bd6c..108aede40 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -35,57 +35,42 @@ article_pages = [ "date":"2023-07-27" }, - {"file":"deploy/linux/index", "os":["linux"]}, - {"file":"deploy/linux/install_overview", "os":["linux"]}, - {"file":"deploy/linux/prerequisites", "os":["linux"]}, - {"file":"deploy/linux/quick_start", "os":["linux"]}, - {"file":"deploy/linux/install", "os":["linux"]}, - {"file":"deploy/linux/upgrade", "os":["linux"]}, - {"file":"deploy/linux/uninstall", "os":["linux"]}, - {"file":"deploy/linux/package_manager_integration", "os":["linux"]}, - {"file":"deploy/docker", "os":["linux"]}, - - {"file":"deploy/windows/cli/index", "os":["windows"]}, - {"file":"deploy/windows/cli/install", "os":["windows"]}, - {"file":"deploy/windows/cli/uninstall", "os":["windows"]}, - {"file":"deploy/windows/cli/upgrade", "os":["windows"]}, - {"file":"deploy/windows/gui/index", "os":["windows"]}, - {"file":"deploy/windows/gui/install", "os":["windows"]}, - {"file":"deploy/windows/gui/uninstall", "os":["windows"]}, - {"file":"deploy/windows/gui/upgrade", "os":["windows"]}, - {"file":"deploy/windows/index", "os":["windows"]}, - {"file":"deploy/windows/prerequisites", "os":["windows"]}, - {"file":"deploy/windows/quick_start", "os":["windows"]}, + {"file":"tutorials/quick_start/windows", "os":["windows"]}, + {"file":"tutorials/quick_start/linux", "os":["linux"]}, - {"file":"release/gpu_os_support", "os":["linux"]}, - {"file":"release/windows_support", "os":["windows"]}, - {"file":"release/docker_support_matrix", "os":["linux"]}, - - {"file":"reference/gpu_libraries/communication", "os":["linux"]}, - {"file":"reference/ai_tools", "os":["linux"]}, - {"file":"reference/management_tools", "os":["linux"]}, - {"file":"reference/validation_tools", "os":["linux"]}, - {"file":"reference/framework_compatibility/framework_compatibility", "os":["linux"]}, + {"file":"tutorials/install/linux/index", "os":["linux"]}, + {"file":"tutorials/install/linux/install_overview", "os":["linux"]}, + {"file":"tutorials/install/linux/prerequisites", "os":["linux"]}, + + {"file":"tutorials/install/docker", "os":["linux"]}, + {"file":"tutorials/install/magma_install", "os":["linux"]}, + {"file":"tutorials/install/pytorch_install", "os":["linux"]}, + {"file":"tutorials/install/tensorflow_install", "os":["linux"]}, + + {"file":"tutorials/install/windows/index", "os":["windows"]}, + {"file":"tutorials/install/windows/prerequisites", "os":["windows"]}, + {"file":"tutorials/install/windows/cli/index", "os":["windows"]}, + {"file":"tutorials/install/windows/gui/index", "os":["windows"]}, + + {"file":"about/release/linux_support", "os":["linux"]}, + {"file":"about/release/windows_support", "os":["windows"]}, + + {"file":"about/compatibility/docker_image_support_matrix", "os":["linux"]}, + + {"file":"reference/libraries/gpu_libraries/communication", "os":["linux"]}, + {"file":"reference/compilers_tools/index", "os":["linux"]}, {"file":"reference/computer_vision", "os":["linux"]}, - + {"file":"how_to/deep_learning_rocm", "os":["linux"]}, {"file":"how_to/gpu_aware_mpi", "os":["linux"]}, - {"file":"how_to/magma_install/magma_install", "os":["linux"]}, - {"file":"how_to/pytorch_install/pytorch_install", "os":["linux"]}, {"file":"how_to/system_debugging", "os":["linux"]}, - {"file":"how_to/tensorflow_install/tensorflow_install", "os":["linux"]}, - {"file":"examples/machine_learning", "os":["linux"]}, - {"file":"examples/inception_casestudy/inception_casestudy", "os":["linux"]}, - - {"file":"understand/file_reorg", "os":["linux"]}, - - {"file":"understand/isv_deployment_win", "os":["windows"]}, + {"file":"rocm_ai/rocm_ai", "os":["linux"]}, ] external_toc_path = "./sphinx/_toc.yml" -docs_core = ROCmDocs("ROCm 5.6.1 Documentation Home") +docs_core = ROCmDocs("ROCm Documentation") docs_core.setup() external_projects_current_project = "rocm" diff --git a/docs/contribute/feedback.md b/docs/contribute/feedback.md index 573fd6fa6..d81a3c49a 100644 --- a/docs/contribute/feedback.md +++ b/docs/contribute/feedback.md @@ -24,4 +24,4 @@ Issues on existing or absent docs can be filed as ## Email -Send other feedback or questions to [rocm-feedback@amd.com](rocm-feedback@amd.com) +Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback\@amd.com?subject=Documentation Feedback) diff --git a/docs/contribute/index.md b/docs/contribute/index.md new file mode 100644 index 000000000..869dfb3bd --- /dev/null +++ b/docs/contribute/index.md @@ -0,0 +1,73 @@ +# Contributing to ROCm Docs + +AMD values and encourages the ROCm community to contribute to our code and +documentation. This repository is focused on ROCm documentation and this +contribution guide describes the recommended method for creating and modifying our +documentation. + +While interacting with ROCm Documentation, we encourage you to be polite and +respectful in your contributions, content or otherwise. Authors, maintainers of +these docs act on good intentions and to the best of their knowledge. +Keep that in mind while you engage. Should you have issues with contributing +itself, refer to +[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) on the +GitHub repository. + +For additional information on documentation functionalities, +see the user and developer guides for rocm-docs-core +at {doc}`rocm-docs-core documentation `. + +## Supported Formats + +Our documentation includes both Markdown and RST files. Markdown is encouraged +over RST due to the lower barrier to participation. GitHub-flavored Markdown is preferred +for all submissions as it renders accurately on our GitHub repositories. For existing documentation, +[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html) Markdown +is used to implement certain features unsupported in GitHub Markdown. This is +not encouraged for new documentation. AMD will transition +to stricter use of GitHub-flavored Markdown with a few caveats. ROCm documentation +also uses [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) +in our Markdown and RST files. We also use Breathe syntax for Doxygen documentation +in our Markdown files. See +[GitHub](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github)'s +guide on writing and formatting on GitHub as a starting point. + +ROCm documentation adds additional requirements to Markdown and RST based files +as follows: + +- Level one headers are only used for page titles. There must be only one level + 1 header per file for both Markdown and Restructured Text. +- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via + our automated GitHub action on a Pull Request (PR). + See the {doc}`rocm-docs-core linting user guide ` for more details. + +## Filenames and folder structure + +Please use snake case (all lower case letters and underscores instead of spaces) +for file names. For example, `example_file_name.md`. +Our documentation follows Pitchfork for folder structure. +All documentation is in `/docs` except for special files like +the contributing guide in the `/` folder. All images used in the documentation are +placed in the `/docs/data` folder. + +## Language and Style + +Adopt Microsoft C++ docs guidelines for +[Voice and tone](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md). + +ROCm documentation templates to be made public shortly. ROCm templates dictate +the recommended structure and flow of the documentation. Guidelines on how to +integrate figures, equations, and tables are all based off +[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html). + +Font size and selection, page layout, white space control, and other formatting +details are controlled via [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). +Raise issues in `rocm-docs-core` for any formatting concerns and changes requested. + +## More + +For more topics, such as submitting feedback and ways to build documentation, +see the [Contributing Section](https://rocm.docs.amd.com/en/latest/contributing.html) +at [rocm.docs.amd.com](https://rocm.docs.amd.com) + +To learn more about how our documentation is built, refer to the [ROCm toolchain](toolchain.md). diff --git a/docs/about.md b/docs/contribute/toolchain.md similarity index 89% rename from docs/about.md rename to docs/contribute/toolchain.md index 12d8fd6df..e3a7c1660 100644 --- a/docs/about.md +++ b/docs/contribute/toolchain.md @@ -1,9 +1,6 @@ -# About ROCm Documentation +# ROCm documentation toolchain -ROCm documentation is made available under open source [licenses](licensing.md). -Documentation is built using open source toolchains. Contributions to our -documentation is encouraged and welcome. As a contributor, please familiarize -yourself with our documentation toolchain. +Our documentation relies on several open source toolchains and sites. ## `rocm-docs-core` @@ -17,7 +14,7 @@ See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core doc ## Sphinx [Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator -originally used for Python. It is now widely used in the Open Source community. +originally used for Python. It is now widely used in the open source community. Originally, Sphinx supported reStructuredText (RST) based documentation, but Markdown support is now available. ROCm documentation plans to default to Markdown for new projects. diff --git a/docs/data/understand/deep_learning/amd_logo.png b/docs/data/amd_logo.png similarity index 100% rename from docs/data/understand/deep_learning/amd_logo.png rename to docs/data/amd_logo.png diff --git a/docs/data/reference/gpu_arch/image.001.png b/docs/data/conceptual/gpu_arch/image001.png similarity index 100% rename from docs/data/reference/gpu_arch/image.001.png rename to docs/data/conceptual/gpu_arch/image001.png diff --git a/docs/data/reference/gpu_arch/image.002.png b/docs/data/conceptual/gpu_arch/image002.png similarity index 100% rename from docs/data/reference/gpu_arch/image.002.png rename to docs/data/conceptual/gpu_arch/image002.png diff --git a/docs/data/reference/gpu_arch/image.003.png b/docs/data/conceptual/gpu_arch/image003.png similarity index 100% rename from docs/data/reference/gpu_arch/image.003.png rename to docs/data/conceptual/gpu_arch/image003.png diff --git a/docs/data/reference/gpu_arch/image.004.png b/docs/data/conceptual/gpu_arch/image004.png similarity index 100% rename from docs/data/reference/gpu_arch/image.004.png rename to docs/data/conceptual/gpu_arch/image004.png diff --git a/docs/data/reference/gpu_arch/image.005.png b/docs/data/conceptual/gpu_arch/image005.png similarity index 100% rename from docs/data/reference/gpu_arch/image.005.png rename to docs/data/conceptual/gpu_arch/image005.png diff --git a/docs/data/reference/gpu_arch/image.006.png b/docs/data/conceptual/gpu_arch/image006.png similarity index 100% rename from docs/data/reference/gpu_arch/image.006.png rename to docs/data/conceptual/gpu_arch/image006.png diff --git a/docs/data/how_to/tuning_guides/image.010.png b/docs/data/how_to/tuning_guides/image010.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.010.png rename to docs/data/how_to/tuning_guides/image010.png diff --git a/docs/data/how_to/tuning_guides/image.011.png b/docs/data/how_to/tuning_guides/image011.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.011.png rename to docs/data/how_to/tuning_guides/image011.png diff --git a/docs/data/how_to/tuning_guides/image.012.png b/docs/data/how_to/tuning_guides/image012.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.012.png rename to docs/data/how_to/tuning_guides/image012.png diff --git a/docs/data/how_to/tuning_guides/image.013.png b/docs/data/how_to/tuning_guides/image013.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.013.png rename to docs/data/how_to/tuning_guides/image013.png diff --git a/docs/data/how_to/tuning_guides/image.014.png b/docs/data/how_to/tuning_guides/image014.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.014.png rename to docs/data/how_to/tuning_guides/image014.png diff --git a/docs/data/how_to/tuning_guides/image.015.png b/docs/data/how_to/tuning_guides/image015.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.015.png rename to docs/data/how_to/tuning_guides/image015.png diff --git a/docs/data/how_to/tuning_guides/image.016.png b/docs/data/how_to/tuning_guides/image016.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.016.png rename to docs/data/how_to/tuning_guides/image016.png diff --git a/docs/data/how_to/tuning_guides/image.001.png b/docs/data/how_to/tuning_guides/tuning001.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.001.png rename to docs/data/how_to/tuning_guides/tuning001.png diff --git a/docs/data/how_to/tuning_guides/image.002.png b/docs/data/how_to/tuning_guides/tuning002.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.002.png rename to docs/data/how_to/tuning_guides/tuning002.png diff --git a/docs/data/how_to/tuning_guides/image.003.png b/docs/data/how_to/tuning_guides/tuning003.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.003.png rename to docs/data/how_to/tuning_guides/tuning003.png diff --git a/docs/data/how_to/tuning_guides/image.004.png b/docs/data/how_to/tuning_guides/tuning004.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.004.png rename to docs/data/how_to/tuning_guides/tuning004.png diff --git a/docs/data/how_to/tuning_guides/image.005.png b/docs/data/how_to/tuning_guides/tuning005.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.005.png rename to docs/data/how_to/tuning_guides/tuning005.png diff --git a/docs/data/how_to/tuning_guides/image.006.png b/docs/data/how_to/tuning_guides/tuning006.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.006.png rename to docs/data/how_to/tuning_guides/tuning006.png diff --git a/docs/data/how_to/tuning_guides/image.008.png b/docs/data/how_to/tuning_guides/tuning008.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.008.png rename to docs/data/how_to/tuning_guides/tuning008.png diff --git a/docs/data/how_to/tuning_guides/image.009.png b/docs/data/how_to/tuning_guides/tuning009.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.009.png rename to docs/data/how_to/tuning_guides/tuning009.png diff --git a/docs/data/understand/deep_learning/TextClassification_3.png b/docs/data/rocm_ai/TextClassification_3.png similarity index 100% rename from docs/data/understand/deep_learning/TextClassification_3.png rename to docs/data/rocm_ai/TextClassification_3.png diff --git a/docs/data/understand/deep_learning/TextClassification_4.png b/docs/data/rocm_ai/TextClassification_4.png similarity index 100% rename from docs/data/understand/deep_learning/TextClassification_4.png rename to docs/data/rocm_ai/TextClassification_4.png diff --git a/docs/data/understand/deep_learning/TextClassification_5.png b/docs/data/rocm_ai/TextClassification_5.png similarity index 100% rename from docs/data/understand/deep_learning/TextClassification_5.png rename to docs/data/rocm_ai/TextClassification_5.png diff --git a/docs/data/understand/deep_learning/TextClassification_6.png b/docs/data/rocm_ai/TextClassification_6.png similarity index 100% rename from docs/data/understand/deep_learning/TextClassification_6.png rename to docs/data/rocm_ai/TextClassification_6.png diff --git a/docs/data/understand/deep_learning/TextClassification_7.png b/docs/data/rocm_ai/TextClassification_7.png similarity index 100% rename from docs/data/understand/deep_learning/TextClassification_7.png rename to docs/data/rocm_ai/TextClassification_7.png diff --git a/docs/data/understand/deep_learning/image.018.png b/docs/data/rocm_ai/image018.png similarity index 100% rename from docs/data/understand/deep_learning/image.018.png rename to docs/data/rocm_ai/image018.png diff --git a/docs/data/understand/deep_learning/inception_v3.png b/docs/data/rocm_ai/inception_v3.png similarity index 100% rename from docs/data/understand/deep_learning/inception_v3.png rename to docs/data/rocm_ai/inception_v3.png diff --git a/docs/data/understand/deep_learning/mnist_1.png b/docs/data/rocm_ai/mnist_1.png similarity index 100% rename from docs/data/understand/deep_learning/mnist_1.png rename to docs/data/rocm_ai/mnist_1.png diff --git a/docs/data/understand/deep_learning/mnist_2.png b/docs/data/rocm_ai/mnist_2.png similarity index 100% rename from docs/data/understand/deep_learning/mnist_2.png rename to docs/data/rocm_ai/mnist_2.png diff --git a/docs/data/understand/deep_learning/mnist_3.png b/docs/data/rocm_ai/mnist_3.png similarity index 100% rename from docs/data/understand/deep_learning/mnist_3.png rename to docs/data/rocm_ai/mnist_3.png diff --git a/docs/data/understand/deep_learning/mnist_4.png b/docs/data/rocm_ai/mnist_4.png similarity index 100% rename from docs/data/understand/deep_learning/mnist_4.png rename to docs/data/rocm_ai/mnist_4.png diff --git a/docs/data/understand/deep_learning/mnist_5.png b/docs/data/rocm_ai/mnist_5.png similarity index 100% rename from docs/data/understand/deep_learning/mnist_5.png rename to docs/data/rocm_ai/mnist_5.png diff --git a/docs/data/deploy/linux/image.001.png b/docs/data/tutorials/install/linux/linux001.png similarity index 100% rename from docs/data/deploy/linux/image.001.png rename to docs/data/tutorials/install/linux/linux001.png diff --git a/docs/data/deploy/linux/image.002.png b/docs/data/tutorials/install/linux/linux002.png similarity index 100% rename from docs/data/deploy/linux/image.002.png rename to docs/data/tutorials/install/linux/linux002.png diff --git a/docs/data/deploy/linux/image.003.png b/docs/data/tutorials/install/linux/linux003.png similarity index 100% rename from docs/data/deploy/linux/image.003.png rename to docs/data/tutorials/install/linux/linux003.png diff --git a/docs/data/deploy/linux/image.004.png b/docs/data/tutorials/install/linux/linux004.png similarity index 100% rename from docs/data/deploy/linux/image.004.png rename to docs/data/tutorials/install/linux/linux004.png diff --git a/docs/data/how_to/magma_install/image.005.png b/docs/data/tutorials/install/magma_install/magma005.png similarity index 100% rename from docs/data/how_to/magma_install/image.005.png rename to docs/data/tutorials/install/magma_install/magma005.png diff --git a/docs/data/how_to/magma_install/image.006.png b/docs/data/tutorials/install/magma_install/magma006.png similarity index 100% rename from docs/data/how_to/magma_install/image.006.png rename to docs/data/tutorials/install/magma_install/magma006.png diff --git a/docs/data/deploy/windows/000-settings-dark.png b/docs/data/tutorials/install/windows/000-settings-dark.png similarity index 100% rename from docs/data/deploy/windows/000-settings-dark.png rename to docs/data/tutorials/install/windows/000-settings-dark.png diff --git a/docs/data/deploy/windows/000-settings-light.png b/docs/data/tutorials/install/windows/000-settings-light.png similarity index 100% rename from docs/data/deploy/windows/000-settings-light.png rename to docs/data/tutorials/install/windows/000-settings-light.png diff --git a/docs/data/deploy/windows/000-setup-icon.png b/docs/data/tutorials/install/windows/000-setup-icon.png similarity index 100% rename from docs/data/deploy/windows/000-setup-icon.png rename to docs/data/tutorials/install/windows/000-setup-icon.png diff --git a/docs/data/deploy/windows/001-about-dark.png b/docs/data/tutorials/install/windows/001-about-dark.png similarity index 100% rename from docs/data/deploy/windows/001-about-dark.png rename to docs/data/tutorials/install/windows/001-about-dark.png diff --git a/docs/data/deploy/windows/001-about-light.png b/docs/data/tutorials/install/windows/001-about-light.png similarity index 100% rename from docs/data/deploy/windows/001-about-light.png rename to docs/data/tutorials/install/windows/001-about-light.png diff --git a/docs/data/deploy/windows/001-uac-dark.png b/docs/data/tutorials/install/windows/001-uac-dark.png similarity index 100% rename from docs/data/deploy/windows/001-uac-dark.png rename to docs/data/tutorials/install/windows/001-uac-dark.png diff --git a/docs/data/deploy/windows/001-uac-light.png b/docs/data/tutorials/install/windows/001-uac-light.png similarity index 100% rename from docs/data/deploy/windows/001-uac-light.png rename to docs/data/tutorials/install/windows/001-uac-light.png diff --git a/docs/data/deploy/windows/002-initializing.png b/docs/data/tutorials/install/windows/002-initializing.png similarity index 100% rename from docs/data/deploy/windows/002-initializing.png rename to docs/data/tutorials/install/windows/002-initializing.png diff --git a/docs/data/deploy/windows/003-detecting-system-config.png b/docs/data/tutorials/install/windows/003-detecting-system-config.png similarity index 100% rename from docs/data/deploy/windows/003-detecting-system-config.png rename to docs/data/tutorials/install/windows/003-detecting-system-config.png diff --git a/docs/data/deploy/windows/004-installer-window.png b/docs/data/tutorials/install/windows/004-installer-window.png similarity index 100% rename from docs/data/deploy/windows/004-installer-window.png rename to docs/data/tutorials/install/windows/004-installer-window.png diff --git a/docs/data/deploy/windows/012-install-progress.png b/docs/data/tutorials/install/windows/012-install-progress.png similarity index 100% rename from docs/data/deploy/windows/012-install-progress.png rename to docs/data/tutorials/install/windows/012-install-progress.png diff --git a/docs/data/deploy/windows/013-install-complete.png b/docs/data/tutorials/install/windows/013-install-complete.png similarity index 100% rename from docs/data/deploy/windows/013-install-complete.png rename to docs/data/tutorials/install/windows/013-install-complete.png diff --git a/docs/data/deploy/windows/014-uninstall-dark.png b/docs/data/tutorials/install/windows/014-uninstall-dark.png similarity index 100% rename from docs/data/deploy/windows/014-uninstall-dark.png rename to docs/data/tutorials/install/windows/014-uninstall-dark.png diff --git a/docs/data/deploy/windows/014-uninstall-light.png b/docs/data/tutorials/install/windows/014-uninstall-light.png similarity index 100% rename from docs/data/deploy/windows/014-uninstall-light.png rename to docs/data/tutorials/install/windows/014-uninstall-light.png diff --git a/docs/data/deploy/windows/005-deselect-all.png b/docs/data/unused_images/_005-deselect-all-windows.png similarity index 100% rename from docs/data/deploy/windows/005-deselect-all.png rename to docs/data/unused_images/_005-deselect-all-windows.png diff --git a/docs/data/deploy/windows/006-component-options-sdk-core.png b/docs/data/unused_images/_006-component-options-sdk-core-windows.png similarity index 100% rename from docs/data/deploy/windows/006-component-options-sdk-core.png rename to docs/data/unused_images/_006-component-options-sdk-core-windows.png diff --git a/docs/data/deploy/windows/007-component-options-libraries.png b/docs/data/unused_images/_007-component-options-libraries-windows.png similarity index 100% rename from docs/data/deploy/windows/007-component-options-libraries.png rename to docs/data/unused_images/_007-component-options-libraries-windows.png diff --git a/docs/data/deploy/windows/008-component-options-rtc.png b/docs/data/unused_images/_008-component-options-rtc-windows.png similarity index 100% rename from docs/data/deploy/windows/008-component-options-rtc.png rename to docs/data/unused_images/_008-component-options-rtc-windows.png diff --git a/docs/data/deploy/windows/009-component-options-rt.png b/docs/data/unused_images/_009-component-options-rt-windows.png similarity index 100% rename from docs/data/deploy/windows/009-component-options-rt.png rename to docs/data/unused_images/_009-component-options-rt-windows.png diff --git a/docs/data/deploy/windows/010-component-options-vs-plugin.png b/docs/data/unused_images/_010-component-options-vs-plugin-windows.png similarity index 100% rename from docs/data/deploy/windows/010-component-options-vs-plugin.png rename to docs/data/unused_images/_010-component-options-vs-plugin-windows.png diff --git a/docs/data/deploy/windows/011-component-options-radeon-software.png b/docs/data/unused_images/_011-component-options-radeon-software-windows.png similarity index 100% rename from docs/data/deploy/windows/011-component-options-radeon-software.png rename to docs/data/unused_images/_011-component-options-radeon-software-windows.png diff --git a/docs/data/understand/deep_learning/Deep Learning Image 1.png b/docs/data/unused_images/_Deep Learning Image 1.png similarity index 100% rename from docs/data/understand/deep_learning/Deep Learning Image 1.png rename to docs/data/unused_images/_Deep Learning Image 1.png diff --git a/docs/data/understand/deep_learning/Install PyTorch using wheels Package.png b/docs/data/unused_images/_Install PyTorch using wheels Package.png similarity index 100% rename from docs/data/understand/deep_learning/Install PyTorch using wheels Package.png rename to docs/data/unused_images/_Install PyTorch using wheels Package.png diff --git a/docs/data/understand/deep_learning/Machine Learning.png b/docs/data/unused_images/_Machine Learning.png similarity index 100% rename from docs/data/understand/deep_learning/Machine Learning.png rename to docs/data/unused_images/_Machine Learning.png diff --git a/docs/data/understand/deep_learning/Matrix-1.png b/docs/data/unused_images/_Matrix-1.png similarity index 100% rename from docs/data/understand/deep_learning/Matrix-1.png rename to docs/data/unused_images/_Matrix-1.png diff --git a/docs/data/understand/deep_learning/Matrix-2.png b/docs/data/unused_images/_Matrix-2.png similarity index 100% rename from docs/data/understand/deep_learning/Matrix-2.png rename to docs/data/unused_images/_Matrix-2.png diff --git a/docs/data/understand/deep_learning/Matrix-3.png b/docs/data/unused_images/_Matrix-3.png similarity index 100% rename from docs/data/understand/deep_learning/Matrix-3.png rename to docs/data/unused_images/_Matrix-3.png diff --git a/docs/data/understand/deep_learning/Model In.png b/docs/data/unused_images/_Model In.png similarity index 100% rename from docs/data/understand/deep_learning/Model In.png rename to docs/data/unused_images/_Model In.png diff --git a/docs/data/understand/deep_learning/Pytorch 11.png b/docs/data/unused_images/_Pytorch 11.png similarity index 100% rename from docs/data/understand/deep_learning/Pytorch 11.png rename to docs/data/unused_images/_Pytorch 11.png diff --git a/docs/data/understand/deep_learning/Text Classification 1.png b/docs/data/unused_images/_Text Classification 1.png similarity index 100% rename from docs/data/understand/deep_learning/Text Classification 1.png rename to docs/data/unused_images/_Text Classification 1.png diff --git a/docs/data/understand/deep_learning/Text Classification 2.png b/docs/data/unused_images/_Text Classification 2.png similarity index 100% rename from docs/data/understand/deep_learning/Text Classification 2.png rename to docs/data/unused_images/_Text Classification 2.png diff --git a/docs/data/understand/deep_learning/Text Classification 3.png b/docs/data/unused_images/_Text Classification 3.png similarity index 100% rename from docs/data/understand/deep_learning/Text Classification 3.png rename to docs/data/unused_images/_Text Classification 3.png diff --git a/docs/data/understand/deep_learning/Text Classification 4.png b/docs/data/unused_images/_Text Classification 4.png similarity index 100% rename from docs/data/understand/deep_learning/Text Classification 4.png rename to docs/data/unused_images/_Text Classification 4.png diff --git a/docs/data/understand/deep_learning/Text Classification 5.png b/docs/data/unused_images/_Text Classification 5.png similarity index 100% rename from docs/data/understand/deep_learning/Text Classification 5.png rename to docs/data/unused_images/_Text Classification 5.png diff --git a/docs/data/how_to/tuning_guides/image.007.png b/docs/data/unused_images/_image.007-tuning.png similarity index 100% rename from docs/data/how_to/tuning_guides/image.007.png rename to docs/data/unused_images/_image.007-tuning.png diff --git a/docs/data/understand/deep_learning/mnist 4.png b/docs/data/unused_images/_mnist 4.png similarity index 100% rename from docs/data/understand/deep_learning/mnist 4.png rename to docs/data/unused_images/_mnist 4.png diff --git a/docs/data/understand/deep_learning/mnist 5.png b/docs/data/unused_images/_mnist 5.png similarity index 100% rename from docs/data/understand/deep_learning/mnist 5.png rename to docs/data/unused_images/_mnist 5.png diff --git a/docs/data/framework_compatibility/with_pytorch.png b/docs/data/unused_images/_with_pytorch.png similarity index 100% rename from docs/data/framework_compatibility/with_pytorch.png rename to docs/data/unused_images/_with_pytorch.png diff --git a/docs/data/framework_compatibility/with_tensorflow.png b/docs/data/unused_images/_with_tensorflow.png similarity index 100% rename from docs/data/framework_compatibility/with_tensorflow.png rename to docs/data/unused_images/_with_tensorflow.png diff --git a/docs/examples/all.md b/docs/examples/all.md deleted file mode 100644 index cfb77a94d..000000000 --- a/docs/examples/all.md +++ /dev/null @@ -1,25 +0,0 @@ -# All Tutorial Material - -:::::{grid} 1 1 2 2 -:gutter: 1 - -:::{grid-item-card} ROCm Examples -:link: https://github.com/amd/rocm-examples -:link-type: url -Samples codes demonstrating and explaining the use of the HIP API as well as -ROCm-accelerated domain libraries. - -::: - -:::{grid-item-card} AI/ML/Inferencing -:link: machine_learning/all -:link-type: doc -Detailed walkthroughs of specific use-cases driven by frameworks using ROCm -acceleration. - -- [Implementing Inception V3 on ROCm with PyTorch](machine_learning/pytorch_inception.md) -- [Optimizing Inference with MIGraphX](machine_learning/migraphx_optimization.md) - -::: - -::::: diff --git a/docs/how_to/deep_learning_rocm.md b/docs/how_to/deep_learning_rocm.md index f65b07851..9767b75f0 100644 --- a/docs/how_to/deep_learning_rocm.md +++ b/docs/how_to/deep_learning_rocm.md @@ -1,21 +1,20 @@ # Deep Learning Guide The following sections cover the different framework installations for ROCm and -Deep Learning applications. {numref}`Rocm-Compat-Frameworks-Flowchart` provides +Deep Learning applications. The following image provides the sequential flow for the use of each framework. Refer to the ROCm Compatible Frameworks Release Notes for each framework's most current release notes at -{ref}`ml_framework_compat_matrix`. +[Third party support](../about/compatibility/3rd_party_support_matrix). + +```{figure} ../data/tutorials/install/magma_install/magma005.png +:name: rocm-compat-frameworks-chart +:align: center -```{figure} ../data/how_to/magma_install/image.005.png -:name: Rocm-Compat-Frameworks-Flowchart ---- -align: center ---- ROCm Compatible Frameworks Flowchart ``` ## Frameworks Installation -- [How to Install PyTorch?](pytorch_install/pytorch_install) -- [How to Install Tensorflow?](tensorflow_install/tensorflow_install) -- [How to Install Magma?](magma_install/magma_install) +- [How to Install PyTorch?](../tutorials/install/pytorch_install) +- [How to Install Tensorflow?](../tutorials/install/tensorflow_install) +- [How to Install Magma?](../tutorials/install/magma_install) diff --git a/docs/how_to/gpu_aware_mpi.md b/docs/how_to/gpu_aware_mpi.md index a13484217..0eecc358c 100644 --- a/docs/how_to/gpu_aware_mpi.md +++ b/docs/how_to/gpu_aware_mpi.md @@ -49,7 +49,7 @@ export BUILD_DIR=/tmp/ompi_for_gpu_build mkdir -p $BUILD_DIR ``` -```note +```{note} The following sequences of build commands assume either the ROCmCC or the AOMP compiler is active in the environment, which will execute the commands. ``` @@ -72,8 +72,7 @@ make -j $(nproc) make -j $(nproc) install ``` -The following -[table](../release/3rd_party_support_matrix.md#communication-libraries) +The [communication libraries tables](#communication_libraries) documents the compatibility of UCX versions with ROCm versions. ## Install Open MPI @@ -149,7 +148,7 @@ larger than 67MB, an effective utilization of about 150GB/sec is achieved, which corresponds to 75% of the peak transfer bandwidth of 200GB/sec for that connection: -:::{figure} /data/how_to/gpu_enabled_mpi_1.png +:::{figure} ../data/how_to/gpu_enabled_mpi_1.png :name: mpi-bandwidth :alt: OSU execution showing transfer bandwidth increasing alongside payload inc. Inter-GPU bandwidth with various payload sizes. @@ -162,7 +161,7 @@ Unified Collective Communication Library (UCC) component in Open MPI. For this, the UCC library has to be configured and compiled with ROCm support. -Please note the compatibility [table](../release/3rd_party_support_matrix.md#communication-libraries) +Please note the compatibility [tables](#communication_libraries) for UCC versions with the various ROCm versions. An example for configuring UCC and Open MPI with ROCm support diff --git a/docs/how_to/all.md b/docs/how_to/index.md similarity index 100% rename from docs/how_to/all.md rename to docs/how_to/index.md diff --git a/docs/how_to/tuning_guides/index.md b/docs/how_to/tuning_guides/index.md index b811f674e..af3666418 100644 --- a/docs/how_to/tuning_guides/index.md +++ b/docs/how_to/tuning_guides/index.md @@ -52,7 +52,7 @@ compute nodes to get the best performance out of them. - [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf) - [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf) -- [Guide](./mi200.md) +- [Guide](./mi200) ::: @@ -62,7 +62,7 @@ accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs - [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf) - [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf) -- [Guide](./mi100.md) +- [Guide](./mi100) ::: @@ -90,7 +90,7 @@ PRO W6800 and AMD Radeon PRO V620 - [AMD RDNA2 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf) - [Whitepaper](https://www.amd.com/system/files/documents/rdna2-explained-radeon-pro-W6000.pdf) -- [Guide](./w6000_v620.md) +- [Guide](./w6000_v620) ::: diff --git a/docs/how_to/tuning_guides/mi100.md b/docs/how_to/tuning_guides/mi100.md index 3dee4106d..efab47ed9 100644 --- a/docs/how_to/tuning_guides/mi100.md +++ b/docs/how_to/tuning_guides/mi100.md @@ -11,14 +11,14 @@ AMD EPYC™ 7003 Series Processors" depending on the processor generation of the system. In addition to the BIOS settings listed below the following settings -({ref}`bios_settings`) will also have to be enacted via the command line (see -{ref}`os_settings`): +({ref}`mi100_bios_settings`) will also have to be enacted via the command line (see +{ref}`mi100_os_settings`): - Core C states - AMD-PCI-UTIL (on AMD EPYC™ 7002 series processors) - IOMMU (if needed) -(bios_settings)= +(mi100_bios_settings)= ### System BIOS Settings @@ -28,7 +28,7 @@ System BIOS settings has been validated. These settings must be used for the qualification process and should be set as default values for the system BIOS. Analogous settings for other non-AMI System BIOS providers could be set similarly. For systems with Intel processors, some settings may not apply or be -available as listed in {numref}`mi100-bios`. +available as listed in the following table. ```{list-table} Recommended settings for the system BIOS in a GIGABYTE platform. :header-rows: 1 @@ -216,7 +216,7 @@ NBIO components. #### Memory Configuration -For the memory addressing modes (see {numref}`mi100-bios`), especially the +For the memory addressing modes, especially the number of NUMA nodes per socket/processor (NPS), the recommended setting is to follow the guidance of the "High Performance Computing (HPC) Tuning Guide for AMD EPYC™ 7002 Series Processors" and "High Performance Computing (HPC) @@ -230,7 +230,7 @@ processor (NPS4). For memory bandwidth sensitive applications using MPI, NPS4 is recommended. For applications that are not optimized for NUMA locality, NPS1 is the recommended setting. -(os_settings)= +(mi100_os_settings)= ### Operating System Settings @@ -360,11 +360,11 @@ installed. ## System Management For a complete guide on how to install/manage/uninstall ROCm on Linux, refer to -[Deploy ROCm on Linux](../../deploy/linux/index.md). For verifying that the +[Installing ROCm on Linux](../../tutorials/install/linux/index). For verifying that the installation was successful, refer to {ref}`verifying-kernel-mode-driver-installation` and -[Validation Tools](../../reference/validation_tools.md). Should verification -fail, consult the [System Debugging Guide](../system_debugging.md). +[Validation Tools](../../reference/compilers_tools/validation_tools). Should verification +fail, consult the [System Debugging Guide](../system_debugging). (mi100-hw-verification)= @@ -375,22 +375,22 @@ the GPU hardware, the `rocm-smi` command is available. It can show available GPUs in the system with their device ID and their respective firmware (or VBIOS) versions: -:::{figure-md} mi100-smi-showhw - -rocm-smi --showhw output on an 8*MI100 system. +```{figure} ../../data/how_to/tuning_guides/tuning001.png +:name: mi100-smi-showhw +:alt: rocm-smi --showhw output on an 8*MI100 system `rocm-smi --showhw` output on an 8*MI100 system. -::: +``` Another important query is to show the system structure, the localization of the GPUs in the system, and the fabric connections between the system components: -:::{figure-md} mi100-smi-showtopo - -rocm-smi --showtopo output on an 8*MI100 system. +```{figure} ../../data/how_to/tuning_guides/tuning002.png +:name: mi100-smi-showtopo +:alt: rocm-smi --showtopo output on an 8*MI100 system. `rocm-smi --showtopo` output on an 8*MI100 system. -::: +``` The previous command shows the system structure in four blocks: @@ -414,19 +414,19 @@ available with the AMD ROCm™ platform. It lists specific details about the GPU devices, including but not limited to the number of compute units, width of the SIMD pipelines, memory information, and instruction set architecture: -:::{figure-md} mi100-rocminfo - -rocminfo output fragment on an 8*MI100 system. +```{figure} ../../data/how_to/tuning_guides/tuning003.png +:name: mi100-rocminfo +:alt: rocminfo output fragment on an 8*MI100 system. `rocminfo` output fragment on an 8*MI100 system. -::: +``` For a complete list of architecture (LLVM target) names, refer to -[GPU OS Support](../../release/gpu_os_support.md). +[Linux Support](../../about/release/linux_support.md) and [Windows Support](../../about/release/windows_support.md). ### Testing Inter-device Bandwidth -{numref}`mi100-hw-verification` showed the `rocm-smi --showtopo` command to show +{ref}`mi100-hw-verification` showed the `rocm-smi --showtopo` command to show how the system structure and how the GPUs are located and connected in this structure. For more details, the `rocm-bandwidth-test` can run benchmarks to show the effective link bandwidth between the components of the system. @@ -468,37 +468,37 @@ Alternatively, the source code can be downloaded and built from The output will list the available compute devices (CPUs and GPUs): -:::{figure-md} mi100-bandwidth-test-1 - -rocm-bandwidth-test output fragment on an 8*MI100 system listing devices. +```{figure} ../../data/how_to/tuning_guides/tuning004.png +:name: mi100-bandwidth-test-1 +:alt: rocm-bandwidth-test output fragment on an 8*MI100 system listing devices. `rocm-bandwidth-test` output fragment on an 8*MI100 system listing devices. -::: +``` The output will also show a matrix that contains a "1" if a device can communicate to another device (CPU and GPU) of the system and it will show the NUMA distance (similar to `rocm-smi`): -:::{figure-md} mi100-bandwidth-test-2 - -rocm-bandwidth-test output fragment on an 8*MI100 system showing inter-device access matrix. +```{figure} ../../data/how_to/tuning_guides/tuning005.png +:name: mi100-bandwidth-test-2 +:alt: rocm-bandwidth-test output fragment on an 8*MI100 system showing inter-device access matrix. `rocm-bandwidth-test` output fragment on an 8*MI100 system showing inter-device access matrix. -::: +``` -:::{figure-md} mi100-bandwidth-test-3 - -rocm-bandwidth-test output fragment on an 8*MI100 system showing inter-device NUMA distance. +```{figure} ../../data/how_to/tuning_guides/tuning006.png +:name: mi100-bandwidth-test-3 +:alt: rocm-bandwidth-test output fragment on an 8*MI100 system showing inter-device NUMA distance. `rocm-bandwidth-test` output fragment on an 8*MI100 system showing inter-device NUMA distance. -::: +``` The output also contains the measured bandwidth for unidirectional and bidirectional transfers between the devices (CPU and GPU): -:::{figure-md} mi100-bandwidth-test-4 - -rocm-bandwidth-test output fragment on an 8*MI100 system showing uni- and bidirectional bandwidths. +```{figure} ../../data/how_to/tuning_guides/tuning004.png +:name: mi100-bandwidth-test-4 +:alt: rocm-bandwidth-test output fragment on an 8*MI100 system showing uni- and bidirectional bandwidths. `rocm-bandwidth-test` output fragment on an 8*MI100 system showing uni- and bidirectional bandwidths. -::: +``` diff --git a/docs/how_to/tuning_guides/mi200.md b/docs/how_to/tuning_guides/mi200.md index 867b66da4..4fe1fe47a 100644 --- a/docs/how_to/tuning_guides/mi200.md +++ b/docs/how_to/tuning_guides/mi200.md @@ -8,14 +8,14 @@ is advised to configure the system for the best possible host configuration according to the "High Performance Computing (HPC) Tuning Guide for AMD EPYC 7003 Series Processors." -Configure the system BIOS settings as explained in {ref}`bios_settings` and +Configure the system BIOS settings as explained in {ref}`mi200_bios_settings` and enact the below given settings via the command line as explained in -{ref}`os_settings`: +{ref}`mi200_os_settings`: - Core C states - IOMMU (if needed) -(bios_settings)= +(mi200_bios_settings)= ### System BIOS Settings @@ -25,7 +25,7 @@ of system BIOS settings has been validated. These settings must be used for the qualification process and should be set as default values for the system BIOS. Analogous settings for other non-AMI System BIOS providers could be set similarly. For systems with Intel processors, some settings may not apply or be -available as listed in {numref}`mi200-bios`. +available as listed in the following table. ```{list-table} Recommended settings for the system BIOS in a GIGABYTE platform. :header-rows: 1 @@ -207,13 +207,13 @@ NBIO components. #### Memory Configuration -For setting the memory addressing modes (see {numref}`mi200-bios`), especially +For setting the memory addressing modes, especially the number of NUMA nodes per socket/processor (NPS), follow the guidance of the "High Performance Computing (HPC) Tuning Guide for AMD EPYC 7003 Series Processors" to provide the optimal configuration for host side computation. For most HPC workloads, NPS=4 is the recommended value. -(os_settings)= +(mi200_os_settings)= ### Operating System Settings @@ -343,11 +343,11 @@ installed. ## System Management For a complete guide on how to install/manage/uninstall ROCm on Linux, refer to -[Deploy ROCm on Linux](../../deploy/linux/index.md). For verifying that the +[Installing ROCm on Linux](../../tutorials/install/linux/index). For verifying that the installation was successful, refer to {ref}`verifying-kernel-mode-driver-installation` and -[Validation Tools](../../reference/validation_tools.md). Should verification -fail, consult the [System Debugging Guide](../system_debugging.md). +[Validation Tools](../../reference/compilers_tools/validation_tools). Should verification +fail, consult the [System Debugging Guide](../system_debugging). (mi200-hw-verification)= @@ -358,22 +358,22 @@ the GPU hardware, the `rocm-smi` command is available. It can show available GPUs in the system with their device ID and their respective firmware (or VBIOS) versions: -:::{figure-md} mi200-smi-showhw - -rocm-smi --showhw output on an 8*MI200 system. +```{figure} ../../data/how_to/tuning_guides/tuning008.png +:name: mi200-smi-showhw +:alt: rocm-smi --showhw output on an 8*MI200 system. `rocm-smi --showhw` output on an 8*MI200 system. -::: +``` To see the system structure, the localization of the GPUs in the system, and the fabric connections between the system components, use: -:::{figure-md} mi200-smi-showtopo - -rocm-smi --showtopo output on an 8*MI200 system. +```{figure} ../../data/how_to/tuning_guides/tuning009.png +:name: mi200-smi-showtopo +:alt: rocm-smi --showtopo output on an 8*MI200 system. `rocm-smi --showtopo` output on an 8*MI200 system. -::: +``` - The first block of the output shows the distance between the GPUs similar to what the `numactl` command outputs for the NUMA domains of a system. The @@ -397,19 +397,19 @@ lists specific details about the GPU devices, including but not limited to the number of compute units, width of the SIMD pipelines, memory information, and instruction set architecture: -:::{figure-md} mi200-rocminfo - -rocminfo output fragment on an 8*MI200 system. +```{figure} ../../data/how_to/tuning_guides/image010.png +:name: mi200-rocminfo +:alt: rocminfo output fragment on an 8*MI200 system. `rocminfo` output fragment on an 8*MI200 system. -::: +``` -For a complete list of architecture (LLVM target) names, refer to -[GPU OS Support](../../release/gpu_os_support.md). +For a complete list of architecture (LLVM target) names, refer to GPU OS Support for +[Linux](../../about/release/linux_support) and [Windows](../../about/release/windows_support). ### Testing Inter-device Bandwidth -{numref}`mi100-hw-verification` showed the `rocm-smi --showtopo` command to show +{ref}`mi100-hw-verification` showed the `rocm-smi --showtopo` command to show how the system structure and how the GPUs are located and connected in this structure. For more details, the `rocm-bandwidth-test` can run benchmarks to show the effective link bandwidth between the components of the system. @@ -452,30 +452,30 @@ Alternatively, the source code can be downloaded and built from The output will list the available compute devices (CPUs and GPUs), including their device ID and PCIe ID: -:::{figure-md} mi200-bandwidth-test-1 - -rocm-bandwidth-test output fragment on an 8*MI200 system listing devices. +```{figure} ../../data/how_to/tuning_guides/image011.png +:name: mi200-bandwidth-test-1 +:alt: rocm-bandwidth-test output fragment on an 8*MI200 system listing devices. `rocm-bandwidth-test` output fragment on an 8*MI200 system listing devices. -::: +``` The output will also show a matrix that contains a "1" if a device can communicate to another device (CPU and GPU) of the system and it will show the NUMA distance (similar to `rocm-smi`): -:::{figure-md} mi200-bandwidth-test-2 - -rocm-bandwidth-test output fragment on an 8*MI200 system showing inter-device access matrix and NUMA distances. +```{figure} ../../data/how_to/tuning_guides/image012.png +:name: mi200-bandwidth-test-2 +:alt: rocm-bandwidth-test output fragment on an 8*MI200 system showing inter-device access matrix and NUMA distances. `rocm-bandwidth-test` output fragment on an 8*MI200 system showing inter-device access matrix and NUMA distances. -::: +``` The output also contains the measured bandwidth for unidirectional and bidirectional transfers between the devices (CPU and GPU): -:::{figure-md} mi200-bandwidth-test-3 - -rocm-bandwidth-test output fragment on an 8*MI200 system showing uni- and bidirectional bandwidths. +```{figure} ../../data/how_to/tuning_guides/image013.png +:name: mi200-bandwidth-test-3 +:alt: rocm-bandwidth-test output fragment on an 8*MI200 system showing uni- and bidirectional bandwidths. `rocm-bandwidth-test` output fragment on an 8*MI200 system showing uni- and bidirectional bandwidths. -::: +``` diff --git a/docs/how_to/tuning_guides/w6000_v620.md b/docs/how_to/tuning_guides/w6000_v620.md index c3fd17c46..c4d53f550 100644 --- a/docs/how_to/tuning_guides/w6000_v620.md +++ b/docs/how_to/tuning_guides/w6000_v620.md @@ -5,7 +5,7 @@ This chapter reviews system settings that are required to configure the system for ROCm virtualization on RDNA2-based AMD Radeon™ PRO GPUs. Installing ROCm on Bare Metal follows the routine ROCm -[installation procedure](../../deploy/linux/index.md). +[installation procedure](../../tutorials/install/linux/index). To enable ROCm virtualization on V620, one has to setup Single Root I/O Virtualization (SR-IOV) in the BIOS via setting found in the following @@ -147,35 +147,35 @@ First, assign GPU virtual function (VF) to VM using the following steps. 3. In the **Virtual Machine Manager** GUI, select the **VM** and click **Open**. - :::{figure-md} virtmgr-machine-mgr + ```{figure} ../../data/how_to/tuning_guides/image014.png + :name: virtmgr-machine-mgr + :alt: Virtual Machine Manager - Virtual Machine Manager - - Virtual Machine Manager - ::: + Virtual Machine Manager + ``` 4. In the VM GUI, go to **Show Virtual Hardware Details > Add Hardware** to - configure hardware. + configure hardware. - :::{figure-md} virtmgr-hw-details + ```{figure} ../../data/how_to/tuning_guides/image015.png + :name: virtmgr-hw-details + :alt: Show Virtual Hardware Details - Show Virtual Hardware Details - - Virtual Machine Manager - ::: + Virtual Machine Manager + ``` 5. Go to **Add Hardware > PCI Host Device > VF** and click **Finish**. - :::{figure-md} virtmgr-vf-select + ```{figure} ../../data/how_to/tuning_guides/image016.png + :name: virtmgr-vf-select + :alt: VF Selection - VF Selection - - VF Selection - ::: + VF Selection + ``` Then start the VM. Finally install ROCm on the virtual machine (VM). For detailed instructions, -refer to the [ROCm Installation Guide](../../deploy/linux/index.md). For any +refer to the [ROCm Installation Guide](../../tutorials/install/index). For any issue encountered during installation, write to us [here](mailto:CloudGPUsupport@amd.com). diff --git a/docs/index.md b/docs/index.md index 3e6465ad4..831e120c3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,98 +1,73 @@ -# AMD ROCm™ Documentation +# AMD ROCm™ documentation -:::::{grid} 1 1 3 3 -:gutter: 1 - -::::{grid-item} -:::{dropdown} [What is ROCm?](rocm) -ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for -graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development -tools, and APIs that enable GPU programming from low-level kernel to end-user applications. -[more...](rocm) - -:::: - -::::{grid-item} -:::{dropdown} Deploy ROCm - -- {doc}`/deploy/linux/index` -- {doc}`/deploy/docker` - -::: -:::: - -::::{grid-item} -:::{dropdown} [Release Info](release) - -- [Release Notes](release) -- [GPU and OS Support](release/gpu_os_support) -- [Known Issues](https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue) -- [Compatibility](release/compatibility) -- [Licensing](release/licensing) - -::: -:::: - -::::: +Our documentation is divided into four main categories: ::::{grid} 1 2 2 2 :class-container: rocm-doc-grid :::{grid-item-card} :padding: 2 -[APIs and Reference](reference/all) +**[Tutorials](tutorials/index)** + +Instructional material ^^^ -- [Compilers and Development Tools](reference/compilers) -- [HIP](reference/hip) -- [OpenMP](reference/openmp/openmp) -- [Math Libraries](reference/gpu_libraries/math) -- [C++ Primitives Libraries](reference/gpu_libraries/c++_primitives) -- [Communication Libraries](reference/gpu_libraries/communication) -- [AI Libraries](reference/ai_tools) -- [Computer Vision](reference/computer_vision) -- [Management Tools](reference/management_tools) -- [Validation Tools](reference/validation_tools) +- [Installing ROCm](tutorials/install/index) +- [Installing Magma](tutorials/install/magma_install) +- [Installing PyTorch](tutorials/install/pytorch_install) +- [Installing TensorFlow](tutorials/install/tensorflow_install) +- [GitHub examples](https://github.com/amd/rocm-examples) +- [Artificial intelligence](rocm_ai/rocm_ai) ::: :::{grid-item-card} :padding: 2 -[Understand ROCm](understand/all) -^^^ +**[How-to](how_to/index)** -- [Compiler Disambiguation](understand/compiler_disambiguation) -- [Using CMake](understand/cmake_packages) -- [Linux Folder Structure Reorganization](understand/file_reorg) -- [GPU Isolation Techniques](understand/gpu_isolation) -- [GPU Architecture](understand/gpu_arch) - -::: - -:::{grid-item-card} -:padding: 2 -[How to Guides](how_to/all) +Task-oriented walkthroughs ^^^ - [System Tuning for Various Architectures](how_to/tuning_guides/index) - [GPU Aware MPI](how_to/gpu_aware_mpi) - [Setting up for Deep Learning with ROCm](how_to/deep_learning_rocm) - - [Magma Installation](how_to/magma_install/magma_install) - - [PyTorch Installation](how_to/pytorch_install/pytorch_install) - - [TensorFlow Installation](how_to/tensorflow_install/tensorflow_install) -- [System Level Debugging](how_to/system_debugging.md) +- [System Level Debugging](how_to/system_debugging) ::: :::{grid-item-card} :padding: 2 -[Tutorials & Examples](examples/all) +**[Reference](reference/index)** + +Collated information ^^^ -- [Examples](https://github.com/amd/rocm-examples) -- [ML, DL, and AI](examples/machine_learning/all) - - [](examples/machine_learning/pytorch_inception) - - [](examples/machine_learning/migraphx_optimization) +- [Libraries](reference/libraries/index) + - [Math libraries](reference/libraries/gpu_libraries/math) + - [C++ Primitives libraries](reference/libraries/gpu_libraries/c++_primitives) + - [Communication libraries](reference/libraries/gpu_libraries/communication) +- [Compilers & tools](reference/compilers_tools/index) + - [Computer Vision](reference/computer_vision) + - [Management Tools](reference/compilers_tools/management_tools) + - [Validation Tools](reference/compilers_tools/validation_tools) +- [HIP](reference/hip) +- [OpenMP](reference/openmp/openmp) ::: + +:::{grid-item-card} +:padding: 2 +**[Conceptual](conceptual/index)** + +Topic overviews and background information +^^^ + +- [Compiler Disambiguation](conceptual/compiler_disambiguation) +- [Using CMake](conceptual/cmake_packages) +- [Linux Folder Structure Reorganization](conceptual/file_reorg) +- [GPU Isolation Techniques](conceptual/gpu_isolation) +- [GPU Architecture](conceptual/gpu_arch) + +::: + :::: diff --git a/docs/license.md b/docs/license.md deleted file mode 100644 index 0d2deca68..000000000 --- a/docs/license.md +++ /dev/null @@ -1,6 +0,0 @@ -# License - -> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that contains documentation primarily. For other licensing information, see the [Licensing Terms page](./release/licensing). - -```{include} ../LICENSE -``` diff --git a/docs/reference/ai_tools.md b/docs/reference/compilers_tools/ai_tools.md similarity index 100% rename from docs/reference/ai_tools.md rename to docs/reference/compilers_tools/ai_tools.md diff --git a/docs/reference/compilers.md b/docs/reference/compilers_tools/compilers.md similarity index 90% rename from docs/reference/compilers.md rename to docs/reference/compilers_tools/compilers.md index 3ce6ea27a..ed53de8a4 100644 --- a/docs/reference/compilers.md +++ b/docs/reference/compilers_tools/compilers.md @@ -1,4 +1,4 @@ -# Compilers and Tools +# Compilers and tools :::::{grid} 1 1 2 2 :gutter: 1 @@ -13,12 +13,12 @@ inspection of execution state of AMD's commercially available GPU architectures. ::: -:::{grid-item-card} [ROCmCC](./rocmcc/rocmcc) +:::{grid-item-card} [ROCmCC](../rocmcc/rocmcc) ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance computing on AMD GPUs and CPUs and supports various heterogeneous programming models such as HIP, OpenMP, and OpenCL. -- [Documentation](./rocmcc/rocmcc) +- [Documentation](../rocmcc/rocmcc) ::: @@ -50,4 +50,4 @@ Callback/Activity Library for Performance tracing AMD GPUs ## See Also -- [Compiler Disambiguation](../understand/compiler_disambiguation.md) +- [Compiler Disambiguation](../../conceptual/compiler_disambiguation.md) diff --git a/docs/reference/dev_tools.md b/docs/reference/compilers_tools/dev_tools.md similarity index 100% rename from docs/reference/dev_tools.md rename to docs/reference/compilers_tools/dev_tools.md diff --git a/docs/reference/compilers_tools/index.md b/docs/reference/compilers_tools/index.md new file mode 100644 index 000000000..dde300468 --- /dev/null +++ b/docs/reference/compilers_tools/index.md @@ -0,0 +1,3 @@ +# ROCm compilers and tools + +add links... diff --git a/docs/reference/management_tools.md b/docs/reference/compilers_tools/management_tools.md similarity index 100% rename from docs/reference/management_tools.md rename to docs/reference/compilers_tools/management_tools.md diff --git a/docs/reference/validation_tools.md b/docs/reference/compilers_tools/validation_tools.md similarity index 100% rename from docs/reference/validation_tools.md rename to docs/reference/compilers_tools/validation_tools.md diff --git a/docs/reference/all.md b/docs/reference/index.md similarity index 59% rename from docs/reference/all.md rename to docs/reference/index.md index 433047916..60348537f 100644 --- a/docs/reference/all.md +++ b/docs/reference/index.md @@ -1,4 +1,4 @@ -# All Reference Material +# Reference material ## ROCm Software Groups @@ -14,16 +14,16 @@ HIP is both AMD's GPU programming language extension and the GPU runtime. ::: -:::{grid-item-card} [Math Libraries](./gpu_libraries/math) +:::{grid-item-card} [Math Libraries](./libraries/gpu_libraries/math) HIP Math Libraries support the following domains: -- [Linear Algebra Libraries](./gpu_libraries/linear_algebra) -- [Fast Fourier Transforms](./gpu_libraries/fft) -- [Random Numbers](./gpu_libraries/rand) +- [Linear Algebra Libraries](./libraries/gpu_libraries/linear_algebra) +- [Fast Fourier Transforms](./libraries/gpu_libraries/fft) +- [Random Numbers](./libraries/gpu_libraries/rand) ::: -:::{grid-item-card} [C++ Primitive Libraries](./gpu_libraries/c++_primitives) +:::{grid-item-card} [C++ Primitive Libraries](./libraries/gpu_libraries/c++_primitives) ROCm template libraries for C++ primitives and algorithms are as follows: - {doc}`rocPRIM ` @@ -33,14 +33,14 @@ ROCm template libraries for C++ primitives and algorithms are as follows: ::: -:::{grid-item-card} [Communication Libraries](gpu_libraries/communication) +:::{grid-item-card} [Communication Libraries](./libraries/gpu_libraries/communication) Inter and intra-node communication is supported by the following projects: - {doc}`RCCL ` ::: -:::{grid-item-card} [AI Libraries](./ai_tools) +:::{grid-item-card} [Artificial intelligence](../rocm_ai/rocm_ai) Libraries related to AI. - {doc}`MIOpen ` @@ -63,9 +63,9 @@ Computer vision related projects. ::: -:::{grid-item-card} [Compilers and Tools](compilers) +:::{grid-item-card} [Compilers and Tools](compilers_tools/index) -- [ROCmCC](/reference/rocmcc/rocmcc) +- [ROCmCC](./rocmcc/rocmcc) - {doc}`ROCdbgapi ` - {doc}`ROCgdb ` - {doc}`ROCProfiler ` @@ -73,7 +73,7 @@ Computer vision related projects. ::: -:::{grid-item-card} [Management Tools](management_tools) +:::{grid-item-card} [Management Tools](./compilers_tools/management_tools) - {doc}`AMD SMI ` - {doc}`ROCm SMI ` @@ -81,17 +81,17 @@ Computer vision related projects. ::: -:::{grid-item-card} [Validation Tools](validation_tools) +:::{grid-item-card} [Validation Tools](./compilers_tools/validation_tools) - {doc}`ROCm Validation Suite ` - {doc}`TransferBench ` ::: -:::{grid-item-card} [GPU Architectures](gpu_arch) +:::{grid-item-card} GPU Architectures -- [AMD Instinct MI200](./gpu_arch/mi250.md) -- [AMD Instinct MI100](./gpu_arch/mi100.md) +- [AMD Instinct MI200](../conceptual/gpu_arch/mi250.md) +- [AMD Instinct MI100](../conceptual/gpu_arch/mi100.md) ::: diff --git a/docs/reference/gpu_libraries/c++_primitives.md b/docs/reference/libraries/gpu_libraries/c++_primitives.md similarity index 100% rename from docs/reference/gpu_libraries/c++_primitives.md rename to docs/reference/libraries/gpu_libraries/c++_primitives.md diff --git a/docs/reference/gpu_libraries/communication.md b/docs/reference/libraries/gpu_libraries/communication.md similarity index 100% rename from docs/reference/gpu_libraries/communication.md rename to docs/reference/libraries/gpu_libraries/communication.md diff --git a/docs/reference/gpu_libraries/fft.md b/docs/reference/libraries/gpu_libraries/fft.md similarity index 100% rename from docs/reference/gpu_libraries/fft.md rename to docs/reference/libraries/gpu_libraries/fft.md diff --git a/docs/reference/gpu_libraries/linear_algebra.md b/docs/reference/libraries/gpu_libraries/linear_algebra.md similarity index 100% rename from docs/reference/gpu_libraries/linear_algebra.md rename to docs/reference/libraries/gpu_libraries/linear_algebra.md diff --git a/docs/reference/gpu_libraries/math.md b/docs/reference/libraries/gpu_libraries/math.md similarity index 98% rename from docs/reference/gpu_libraries/math.md rename to docs/reference/libraries/gpu_libraries/math.md index be1cd7a6b..01f8d50d2 100644 --- a/docs/reference/gpu_libraries/math.md +++ b/docs/reference/libraries/gpu_libraries/math.md @@ -1,4 +1,4 @@ -# Math Libraries +# Math libraries AMD provides various math domain and support libraries as part of ROCm. diff --git a/docs/reference/gpu_libraries/rand.md b/docs/reference/libraries/gpu_libraries/rand.md similarity index 100% rename from docs/reference/gpu_libraries/rand.md rename to docs/reference/libraries/gpu_libraries/rand.md diff --git a/docs/reference/libraries/index.md b/docs/reference/libraries/index.md new file mode 100644 index 000000000..ed01495e6 --- /dev/null +++ b/docs/reference/libraries/index.md @@ -0,0 +1,8 @@ +# ROCm libraries + +add links... + +* Math +* C++ primitive +* Communication +* Artificial intelligence diff --git a/docs/reference/openmp/openmp.md b/docs/reference/openmp/openmp.md index f727c3056..7037de4e4 100644 --- a/docs/reference/openmp/openmp.md +++ b/docs/reference/openmp/openmp.md @@ -9,12 +9,14 @@ Along with host APIs, the OpenMP compilers support offloading code and data onto GPU devices. This document briefly describes the installation location of the OpenMP toolchain, example usage of device offloading, and usage of `rocprof` with OpenMP applications. The GPUs supported are the same as those supported by -this ROCm release. See the list of supported GPUs in {doc}`/release/gpu_os_support`. +this ROCm release. See the list of supported GPUs in {doc}`../../about/release/linux_support`. The ROCm OpenMP compiler is implemented using LLVM compiler technology. -{numref}`openmp-toolchain` illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code. +The following image illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code. -![OpenMP Toolchain](../../data/reference/openmp/openmp_toolchain.svg) +```{figure} ../../data/reference/openmp/openmp_toolchain.svg +:name: openmp-toolchain +``` ### Installation diff --git a/docs/reference/rocmcc/rocmcc.md b/docs/reference/rocmcc/rocmcc.md index 0aa14113a..6467a539e 100644 --- a/docs/reference/rocmcc/rocmcc.md +++ b/docs/reference/rocmcc/rocmcc.md @@ -497,7 +497,6 @@ offload-arch gfx906 -v The options are listed below: -:::{program} offload-arch :::{option} -h Prints the help message. ::: diff --git a/docs/reference/tools.md b/docs/reference/tools.md deleted file mode 100644 index d52348a2d..000000000 --- a/docs/reference/tools.md +++ /dev/null @@ -1 +0,0 @@ -# Management Tools diff --git a/docs/rocm_a-z.md b/docs/rocm_a-z.md new file mode 100644 index 000000000..53b62475e --- /dev/null +++ b/docs/rocm_a-z.md @@ -0,0 +1,38 @@ +# ROCm A-Z + +:::{table} +:name: rocm_a-z + +| ROCm product | Description | +| :---------------- | :------------ | +| AMD SMI | | +| Composable Kernel | | +| {doc}`HIP ` | AMD’s GPU programming language extension and the GPU runtime | +| hipBLAS | | +| hipCUB | | +| hipFFT | | +| {doc}`HIPIFY ` | Assists with porting applications from CUDA to HIP runtime | +| hipify-clang | A tool to translate CUDA source code into portable HIP C++ | +| hipify-perl | A tool to translate CUDA source code into portable HIP C++ | +| hipSOLVER | | +| hipSPARSE | | +| hipTensor | | +| MIGraphX | | +| MIOpen | | +| MIVisionX | | +| OpenMP | | +| RCCL | | +| rocAL | | +| ROCdbgapi | | +| ROCgdb | | +| ROCmCC | | +| ROCm Data Center Tool | | +| ROCm SMI | | +| ROCm Validation Suite | | +| rocPRIM | A header-only library for HIP parallel primitives | +| ROCProfiler | | +| rocThrust | A parallel algorithm library | +| ROCTracer | | +| TransferBench | | + +::: diff --git a/docs/examples/machine_learning/migraphx_optimization.md b/docs/rocm_ai/migraphx_optimization.md similarity index 98% rename from docs/examples/machine_learning/migraphx_optimization.md rename to docs/rocm_ai/migraphx_optimization.md index d4b8805d0..8b2777b41 100644 --- a/docs/examples/machine_learning/migraphx_optimization.md +++ b/docs/rocm_ai/migraphx_optimization.md @@ -327,12 +327,11 @@ To run generated `.mxr` files through `migraphx-driver`, use the following: ./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy ``` -Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example. +Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. -```{figure} ../../data/understand/deep_learning/image.018.png +```{figure} ../data/rocm_ai/image018.png :name: image018 ---- -align: center ---- +:align: center + Generating a `.mxr` File ``` diff --git a/docs/examples/machine_learning/pytorch_inception.md b/docs/rocm_ai/pytorch_inception.md similarity index 92% rename from docs/examples/machine_learning/pytorch_inception.md rename to docs/rocm_ai/pytorch_inception.md index f8730c07a..6aeb72e7f 100644 --- a/docs/examples/machine_learning/pytorch_inception.md +++ b/docs/rocm_ai/pytorch_inception.md @@ -10,10 +10,10 @@ Training also includes the choice of an optimization algorithm that reduces the ## Training Phases -Training occurs in multiple phases for every batch of training data. {numref}`TypesOfTrainingPhases` provides an explanation of the types of training phases. +Training occurs in multiple phases for every batch of training data. the following table provides an explanation of the types of training phases. :::{table} Types of Training Phases -:name: TypesOfTrainingPhases +:name: training-phases :widths: auto | Types of Phases | | | ----------------- | --- | @@ -23,10 +23,10 @@ Training occurs in multiple phases for every batch of training data. {numref}`Ty | Optimization Pass | The optimization algorithm updates the model parameters using the stored error gradients. | ::: -Training is different from inference, particularly from the hardware perspective. {numref}`TrainingVsInference` shows the contrast between training and inference. +Training is different from inference, particularly from the hardware perspective. The following table shows the contrast between training and inference. :::{table} Training vs. Inference -:name: TrainingVsInference +:name: training-inference :widths: auto | Training | Inference | | ----------- | ----------- | @@ -56,7 +56,7 @@ This example is adapted from the PyTorch research hub page on Inception v3[^torc Follow these steps: -1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](/how_to/pytorch_install/pytorch_install.md) for setting up a PyTorch environment on ROCm. +1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch_install) for setting up a PyTorch environment on ROCm. ```dockerfile docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest @@ -146,7 +146,7 @@ The previous section focused on downloading and using the Inception v3 model for Follow these steps: -1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](how_to/pytorch_install/pytorch_install.md) for setting up a PyTorch environment on ROCm. +1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch_install) for setting up a PyTorch environment on ROCm. ```dockerfile docker pull rocm/pytorch:latest @@ -461,10 +461,10 @@ Follow these steps: torch.save(model.state_dict(), "trained_inception_v3.pt") ``` -Plotting the train and test loss shows both metrics reducing over training epochs. This is demonstrated in {numref}`inceptionV3`. +Plotting the train and test loss shows both metrics reducing over training epochs. This is demonstrated in the following image. -```{figure} ../../data/understand/deep_learning/inception_v3.png -:name: inceptionV3 +```{figure} ../data/rocm_ai/inception_v3.png +:name: inception-v3 --- align: center --- @@ -741,7 +741,7 @@ To understand the code step by step, follow these steps: plt.show() ``` - ```{figure} ../../data/understand/deep_learning/mnist_1.png + ```{figure} ../data/rocm_ai/mnist_1.png --- align: center --- @@ -769,7 +769,7 @@ To understand the code step by step, follow these steps: plt.show() ``` - ```{figure} ../../data/understand/deep_learning/mnist_2.png + ```{figure} ../data/rocm_ai/mnist_2.png --- align: center --- @@ -895,7 +895,7 @@ To understand the code step by step, follow these steps: plt.show() ``` - ```{figure} ../../data/understand/deep_learning/mnist_3.png + ```{figure} ../data/rocm_ai/mnist_3.png --- align: center --- @@ -911,7 +911,7 @@ To understand the code step by step, follow these steps: plt.show() ``` - ```{figure} ../../data/understand/deep_learning/mnist_4.png + ```{figure} ../data/rocm_ai/mnist_4.png --- align: center --- @@ -946,7 +946,7 @@ To understand the code step by step, follow these steps: plt.show() ``` - ```{figure} ../../data/understand/deep_learning/mnist_5.png + ```{figure} ../data/rocm_ai/mnist_5.png --- align: center --- @@ -1115,7 +1115,7 @@ To prepare the data for training, follow these steps: print("Vectorized review", vectorize_text(first_review, first_label)) ``` - ```{figure} ../../data/understand/deep_learning/TextClassification_3.png + ```{figure} ../data/rocm_ai/TextClassification_3.png --- align: center --- @@ -1158,7 +1158,7 @@ To prepare the data for training, follow these steps: model.summary() ``` - ```{figure} ../../data/understand/deep_learning/TextClassification_4.png + ```{figure} ../data/rocm_ai/TextClassification_4.png --- align: center --- @@ -1178,7 +1178,7 @@ To prepare the data for training, follow these steps: history = model.fit(train_ds,validation_data=val_ds,epochs=epochs) ``` - ```{figure} ../../data/understand/deep_learning/TextClassification_5.png + ```{figure} ../data/rocm_ai/TextClassification_5.png --- align: center --- @@ -1224,9 +1224,9 @@ To prepare the data for training, follow these steps: plt.show() ``` - {numref}`TextClassification6` and {numref}`TextClassification7` illustrate the training and validation loss and the training and validation accuracy. + The following images illustrate the training and validation loss and the training and validation accuracy. - ```{figure} ../../data/understand/deep_learning/TextClassification_6.png + ```{figure} ../data/rocm_ai/TextClassification_6.png :name: TextClassification6 --- align: center @@ -1234,7 +1234,7 @@ To prepare the data for training, follow these steps: Training and Validation Loss ``` - ```{figure} ../../data/understand/deep_learning/TextClassification_7.png + ```{figure} ../data/rocm_ai/TextClassification_7.png :name: TextClassification7 --- align: center @@ -1271,15 +1271,3 @@ To prepare the data for training, follow these steps: export_model.predict(examples) ``` - -## References - -[^inception_arch]: C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," CoRR, p. abs/1512.00567, 2015 - -[^torch_vision]: PyTorch, \[Online\]. Available: [https://pytorch.org/vision/stable/index.html](https://pytorch.org/vision/stable/index.html) - -[^torch_vision_inception]: PyTorch, \[Online\]. Available: [https://pytorch.org/hub/pytorch_vision_inception_v3/](https://pytorch.org/hub/pytorch_vision_inception_v3/) - -[^Stanford_deep_learning]: Stanford, \[Online\]. Available: [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/) - -[^cross_entropy]: Wikipedia, \[Online\]. Available: [https://en.wikipedia.org/wiki/Cross_entropy](https://en.wikipedia.org/wiki/Cross_entropy) diff --git a/docs/examples/machine_learning/all.md b/docs/rocm_ai/rocm_ai.md similarity index 86% rename from docs/examples/machine_learning/all.md rename to docs/rocm_ai/rocm_ai.md index 7cad258cd..10f9e9179 100644 --- a/docs/examples/machine_learning/all.md +++ b/docs/rocm_ai/rocm_ai.md @@ -1,4 +1,4 @@ -# Machine Learning, Deep Learning, and Artificial Intelligence +# ROCm & artificial intelligence :::::{grid} 1 1 2 2 :gutter: 1 diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 78f47ef70..1b0e05160 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -6,270 +6,60 @@ defaults: root: index subtrees: - entries: - - file: rocm -- caption: Deploy ROCm - entries: - - file: deploy/linux/quick_start - title: Linux Quick Start - - file: deploy/linux/index - title: Linux Overview + - file: what_is_rocm + title: What is ROCm? subtrees: - - entries: - - file: deploy/linux/install_overview.md - title: Installation Overview - - file: deploy/linux/prerequisites - title: Prerequisites - - file: deploy/linux/os-native/index - subtrees: - - entries: - - file: deploy/linux/os-native/install - title: Installation - - file: deploy/linux/os-native/upgrade - title: Upgrade - - file: deploy/linux/os-native/uninstall - title: Uninstallation - - file: deploy/linux/os-native/package_manager_integration - - file: deploy/linux/installer/index - subtrees: - - entries: - - file: deploy/linux/installer/install - title: Installation - - file: deploy/linux/installer/upgrade - title: Upgrade - - file: deploy/linux/installer/uninstall - title: Uninstallation - - file: deploy/windows/quick_start - title: Windows Quick Start - - file: deploy/windows/index - title: Windows Overview + - entries: + - file: rocm_ai/rocm_ai.md + title: ROCm & AI + - file: tutorials/quick_start/index.md + title: Quick start subtrees: - - entries: - - file: deploy/windows/prerequisites - title: Prerequisites - - file: deploy/windows/gui/index - subtrees: - - entries: - - file: deploy/windows/gui/install - title: Installation - - file: deploy/windows/gui/upgrade - title: Upgrade - - file: deploy/windows/gui/uninstall - title: Uninstallation - - file: deploy/windows/cli/index - subtrees: - - entries: - - file: deploy/windows/cli/install - title: Installation - - file: deploy/windows/cli/upgrade - title: Upgrade - - file: deploy/windows/cli/uninstall - title: Uninstallation - - file: deploy/docker - title: Docker - -- caption: Release Info - entries: - - file: release - - file: CHANGELOG - title: Changelog - - file: release/gpu_os_support - - file: release/windows_support - - file: release/versions - - url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue - title: Known Issues - - file: release/compatibility + - entries: + - file: tutorials/quick_start/linux.md + title: Linux + - file: tutorials/quick_start/windows.md + title: Windows + - file: about/compatibility/index.md + title: Compatibility & support + - file: CHANGELOG.md + title: Release information subtrees: - - entries: - - file: release/user_kernel_space_compat_matrix - - file: release/docker_image_support_matrix - - file: release/3rd_party_support_matrix - - file: release/licensing + - entries: + - file: whats_new/whats_new.md + title: What's new? + - file: about/release/release_history.md + title: Release history + - url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue + title: Known issues + - file: tutorials/index.md + title: Tutorials + subtrees: + - entries: + - file: tutorials/install/linux/index.md + title: Install ROCm (Linux) + - file: tutorials/install/windows/index.md + title: Install ROCm (Windows) + - file: how_to/index.md + title: How-to guides + - file: reference/index.md + title: Reference guides + - file: conceptual/index.md + title: Conceptual documentation + - file: rocm_a-z.md + title: ROCm A-Z + - file: contribute/index.md + title: Contributing + subtrees: + - entries: + - file: contribute/toolchain.md + title: Documentation tools + - file: contribute/building.md + title: Building documentation + - file: contribute/feedback.md + title: Providing feedback + - file: about/license.md + title: ROCm licensing -- caption: APIs and Reference - entries: - - file: reference/all - - file: reference/hip - subtrees: - - entries: - - title: HIP Runtime API - url: ${project:hip} - - title: HIPify - Port Your Code - url: ${project:hipify} - - file: reference/gpu_libraries/math - title: Math Libraries - subtrees: - - entries: - - file: reference/gpu_libraries/linear_algebra - subtrees: - - entries: - - title: rocBLAS - url: ${project:rocblas} - - title: hipBLAS - url: ${project:hipblas} - - title: hipBLASLt - url: ${project:hipblaslt} - - title: rocALUTION - url: ${project:rocalution} - - title: rocWMMA - url: ${project:rocwmma} - - title: rocSOLVER - url: ${project:rocsolver} - - title: hipSOLVER - url: ${project:hipsolver} - - title: rocSPARSE - url: ${project:rocsparse} - - title: hipSPARSE - url: ${project:hipsparse} - - title: hipSPARSELt - url: ${project:hipsparselt} - - file: reference/gpu_libraries/fft - subtrees: - - entries: - - title: rocFFT - url: ${project:rocfft} - - title: hipFFT - url: ${project:hipfft} - - file: reference/gpu_libraries/rand - subtrees: - - entries: - - title: rocRAND - url: ${project:rocrand} - - title: hipRAND - url: ${project:hiprand} - - file: reference/gpu_libraries/c++_primitives - title: C++ Primitive Libraries - subtrees: - - entries: - - title: rocPRIM - url: ${project:rocprim} - - entries: - - title: rocThrust - url: ${project:rocthrust} - - entries: - - title: hipCUB - url: ${project:hipcub} - - entries: - - title: hipTensor - url: ${project:hiptensor} - - file: reference/gpu_libraries/communication - title: Communication Libraries - subtrees: - - entries: - - title: RCCL - url: ${project:rccl} - - file: reference/ai_tools - title: AI Libraries - subtrees: - - entries: - - title: MIOpen - Machine Intelligence - url: ${project:miopen} - - title: Composable Kernel - url: ${project:composable_kernel} - - title: MIGraphX - Graph Optimization - url: ${project:amdmigraphx} - - file: reference/computer_vision - subtrees: - - entries: - - url: ${project:mivisionx} - title: MIVisionX - - entries: - - url: ${project:rocal} - title: rocAL - - file: reference/openmp/openmp - title: OpenMP - - file: reference/compilers - title: Compilers and Tools - subtrees: - - entries: - - file: reference/rocmcc/rocmcc - title: ROCmCC - - url: ${project:rocgdb} - title: ROCgdb - - url: ${project:rocprofiler} - title: ROCProfiler - - url: ${project:roctracer} - title: ROCTracer - - url: ${project:rocdbgapi} - title: ROCdbgapi - - file: reference/management_tools - title: Management Tools - subtrees: - - entries: - - url: https://rocm.docs.amd.com/projects/amdsmi/en/{branch}/ - title: AMD SMI - - url: https://rocm.docs.amd.com/projects/rocm_smi_lib/en/{branch}/ - title: ROCm SMI - - url: ${project:rdc} - title: ROCm Datacenter Tool - - file: reference/validation_tools - title: Validation Tools - subtrees: - - entries: - - url: ${project:rocmvalidationsuite} - title: RVS - - url: ${project:transferbench} - title: TransferBench -- caption: Understand ROCm - entries: - - file: understand/all.md - - title: Compiler Disambiguation - file: understand/compiler_disambiguation - - file: understand/cmake_packages - - file: understand/file_reorg - - file: understand/gpu_isolation - - file: understand/gpu_arch - subtrees: - - entries: - - file: understand/gpu_arch/mi250 - title: MI250 - - file: understand/gpu_arch/mi200_performance_counters - title: MI200 Performance Counters and Metrics - - file: understand/gpu_arch/mi100 - title: MI100 - - file: understand/using_gpu_sanitizer - title: Using GPU Sanitizer - - file: understand/More-about-how-ROCm-uses-PCIe-Atomics -- caption: How to Guides - entries: - - file: how_to/all - - title: Tuning Guides - file: how_to/tuning_guides/index.md - subtrees: - - entries: - - title: MI200 - file: how_to/tuning_guides/mi200.md - - title: MI100 - file: how_to/tuning_guides/mi100.md - - title: PRO W6000 & V620 - file: how_to/tuning_guides/w6000_v620.md - - file: how_to/deep_learning_rocm - subtrees: - - entries: - - file: how_to/magma_install/magma_install - - file: how_to/pytorch_install/pytorch_install - - file: how_to/tensorflow_install/tensorflow_install - - file: how_to/gpu_aware_mpi - - file: how_to/system_debugging -- caption: Tutorials & Examples - file: examples - entries: - - title: ROCm Examples - url: https://github.com/amd/rocm-examples - - title: Machine Learning - file: examples/machine_learning/all - subtrees: - - entries: - - file: examples/machine_learning/pytorch_inception - - file: examples/machine_learning/migraphx_optimization - -- caption: About - entries: - - file: about - - file: contributing - subtrees: - - entries: - - file: contribute/building.md - - file: contribute/feedback.md - - file: license.md diff --git a/docs/gpu_libraries.md b/docs/temp/gpu_libraries.md similarity index 100% rename from docs/gpu_libraries.md rename to docs/temp/gpu_libraries.md diff --git a/docs/kernel_userspace.md b/docs/temp/kernel_userspace.md similarity index 100% rename from docs/kernel_userspace.md rename to docs/temp/kernel_userspace.md diff --git a/docs/packaging_guidelines.md b/docs/temp/packaging_guidelines.md similarity index 100% rename from docs/packaging_guidelines.md rename to docs/temp/packaging_guidelines.md diff --git a/docs/examples/troubleshooting.md b/docs/temp/troubleshooting.md similarity index 85% rename from docs/examples/troubleshooting.md rename to docs/temp/troubleshooting.md index e56919059..bc34f4ff7 100644 --- a/docs/examples/troubleshooting.md +++ b/docs/temp/troubleshooting.md @@ -1,4 +1,3 @@ - # Troubleshooting **Q: What do I do if I get this error when trying to run PyTorch:** @@ -39,7 +38,7 @@ To implement a workaround, follow these steps: **Q: Why am I unable to access Docker or GPU in user accounts?** Ans: Ensure that the user is added to docker, video, and render Linux groups as -described in the ROCm Installation Guide at {ref}`setting_group_permissions`. +described in the ROCm Installation Guide at {ref}`linux_group_permissions`. **Q: Can I install PyTorch directly on bare metal?** @@ -50,7 +49,3 @@ Option 2: Install PyTorch Using Wheels Package in the section **Q: How do I profile PyTorch workloads?** Ans: Use the PyTorch Profiler to profile GPU kernels on ROCm. - ------- - -[^ROCm_issues]: AMD, "ROCm issues," \[Online\]. Available: [https://github.com/RadeonOpenCompute/ROCm/issues](https://github.com/RadeonOpenCompute/ROCm/issues) diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md new file mode 100644 index 000000000..df26efb90 --- /dev/null +++ b/docs/tutorials/index.md @@ -0,0 +1,27 @@ +# ROCm tutorials + +:::::{grid} 1 1 2 2 +:gutter: 1 + +:::{grid-item-card} [Installing ROCm](./install/index.md) + +Learn how to install ROCm on Linux and Windows. + +::: + +:::{grid-item-card} [ROCm examples](https://github.com/amd/rocm-examples) +:link-type: url +Sample code demonstrating the HIP API and ROCm-accelerated domain libraries. + +::: + +:::{grid-item-card} [Artificial intelligence](../rocm_ai/rocm_ai) + +Detailed walkthroughs of specific artificial intelligence use cases using ROCm acceleration. + +- [Implementing Inception V3 on ROCm with PyTorch](../rocm_ai/pytorch_inception) +- [Optimizing Inference with MIGraphX](../rocm_ai/migraphx_optimization) + +::: + +::::: diff --git a/docs/deploy/docker.md b/docs/tutorials/install/docker.md similarity index 97% rename from docs/deploy/docker.md rename to docs/tutorials/install/docker.md index ade97332e..2c2119c27 100644 --- a/docs/deploy/docker.md +++ b/docs/tutorials/install/docker.md @@ -4,7 +4,7 @@ Docker containers share the kernel with the host operating system, therefore the ROCm kernel-mode driver must be installed on the host. Please refer to -{ref}`using-the-package-manager` on installing `amdgpu-dkms`. The other +{ref}`linux_install_methods` on installing `amdgpu-dkms`. The other user-space parts (like the HIP-runtime or math libraries) of the ROCm stack will be loaded from the container image and don't need to be installed to the host. diff --git a/docs/tutorials/install/index.md b/docs/tutorials/install/index.md new file mode 100644 index 000000000..f74550b74 --- /dev/null +++ b/docs/tutorials/install/index.md @@ -0,0 +1,22 @@ +# Installing ROCm + +Our installation guides are designed to walk you through a ROCm installation in detail. If you want to get up and running quickly, try our [quick-start guides](../quick_start/index). + +:::::{grid} 1 1 2 2 +:gutter: 1 + +:::{grid-item-card} Linux installation guide +:link: ./linux/index +:link-type: doc +Install ROCm on Linux. + +::: + +:::{grid-item-card} Windows installation guide +:link: ./windows/index +:link-type: doc +Install ROCm on Linux. + +::: + +::::: diff --git a/docs/deploy/linux/index.md b/docs/tutorials/install/linux/index.md similarity index 82% rename from docs/deploy/linux/index.md rename to docs/tutorials/install/linux/index.md index 49080b0d5..181196573 100644 --- a/docs/deploy/linux/index.md +++ b/docs/tutorials/install/linux/index.md @@ -1,6 +1,6 @@ -# Deploy ROCm on Linux +# Install ROCm on Linux -Start with {doc}`/deploy/linux/quick_start` or follow the detailed +Start with {doc}`../../quick_start/linux` or follow the detailed instructions below. ## Prepare to Install @@ -26,6 +26,8 @@ Standard Packages vs Multi-Version Packages :::: +(linux_install_methods)= + ## Choose your install method ::::{grid} 1 1 2 2 @@ -50,4 +52,4 @@ manager. ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../about/release/linux_support` diff --git a/docs/deploy/linux/install_overview.md b/docs/tutorials/install/linux/install_overview.md similarity index 94% rename from docs/deploy/linux/install_overview.md rename to docs/tutorials/install/linux/install_overview.md index 848eb1254..ac1b25f60 100644 --- a/docs/deploy/linux/install_overview.md +++ b/docs/tutorials/install/linux/install_overview.md @@ -1,7 +1,7 @@ # ROCm Installation Options (Linux) Users installing ROCm must choose between various installation options. A new -user should follow the [Quick Start guide](./quick_start). +user should follow the [Quick Start guide](../../quick_start/linux). ## Package Manager versus AMDGPU Installer? @@ -65,7 +65,7 @@ multi-version ROCm installation types: ```{figure-md} install-types - + ROCm Installation Types ``` diff --git a/docs/deploy/linux/installer/index.md b/docs/tutorials/install/linux/installer/index.md similarity index 89% rename from docs/deploy/linux/installer/index.md rename to docs/tutorials/install/linux/installer/index.md index e8e43764f..8bdc70d54 100644 --- a/docs/deploy/linux/installer/index.md +++ b/docs/tutorials/install/linux/installer/index.md @@ -28,4 +28,4 @@ Steps for removing ROCm packages, libraries and tools. ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../../about/release/linux_support` diff --git a/docs/deploy/linux/installer/install.md b/docs/tutorials/install/linux/installer/install.md similarity index 100% rename from docs/deploy/linux/installer/install.md rename to docs/tutorials/install/linux/installer/install.md diff --git a/docs/deploy/linux/installer/uninstall.md b/docs/tutorials/install/linux/installer/uninstall.md similarity index 100% rename from docs/deploy/linux/installer/uninstall.md rename to docs/tutorials/install/linux/installer/uninstall.md diff --git a/docs/deploy/linux/installer/upgrade.md b/docs/tutorials/install/linux/installer/upgrade.md similarity index 100% rename from docs/deploy/linux/installer/upgrade.md rename to docs/tutorials/install/linux/installer/upgrade.md diff --git a/docs/deploy/linux/os-native/index.md b/docs/tutorials/install/linux/os-native/index.md similarity index 91% rename from docs/deploy/linux/os-native/index.md rename to docs/tutorials/install/linux/os-native/index.md index e65e3aa21..8a992a409 100644 --- a/docs/deploy/linux/os-native/index.md +++ b/docs/tutorials/install/linux/os-native/index.md @@ -35,4 +35,4 @@ Information about packages. ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../../about/release/linux_support` diff --git a/docs/deploy/linux/os-native/install.md b/docs/tutorials/install/linux/os-native/install.md similarity index 100% rename from docs/deploy/linux/os-native/install.md rename to docs/tutorials/install/linux/os-native/install.md diff --git a/docs/deploy/linux/os-native/package_manager_integration.md b/docs/tutorials/install/linux/os-native/package_manager_integration.md similarity index 90% rename from docs/deploy/linux/os-native/package_manager_integration.md rename to docs/tutorials/install/linux/os-native/package_manager_integration.md index 97ddbfe94..9a0e32df4 100644 --- a/docs/deploy/linux/os-native/package_manager_integration.md +++ b/docs/tutorials/install/linux/os-native/package_manager_integration.md @@ -19,14 +19,14 @@ All meta-packages exist in both versioned and non-versioned forms. - Non-versioned packages – For a single-version installation of the ROCm stack - Versioned packages – For multi-version installations of the ROCm stack -```{figure-md} package-naming - - +```{figure} ../../../../data/tutorials/install/linux/linux002.png +:name: package-naming +:align: center ROCm Release Package Naming ``` -{numref}`package-naming` demonstrates the single and multi-version ROCm packages' naming +The preceding image demonstrates the single and multi-version ROCm packages' naming structure, including examples for various Linux distributions. See terms below: _Module_ - It is the part of the package that represents the name of the ROCm @@ -53,7 +53,7 @@ valid only for rpm packages. ## Components of ROCm Programming Models -{numref}`meta-packages` demonstrates the high-level layered architecture of ROCm +The following image demonstrates the high-level layered architecture of ROCm programming models and their meta-packages. All meta-packages are a combination of required packages and libraries. @@ -64,9 +64,8 @@ of required packages and libraries. - `rocm-hip-sdk` contains runtime components to deploy and execute HIP applications. -```{figure-md} meta-packages - - +```{figure} ../../../../data/tutorials/install/linux/linux003.png +:name: meta-packages ROCm Meta Packages ``` @@ -100,9 +99,8 @@ This section discusses the available meta-packages and their packages. The following image visualizes the meta-packages and their associated packages in a ROCm programming model. -```{figure-md} assoc-packages - - +```{figure} ../../../../data/tutorials/install/linux/linux004.png +:name: assoc-packages Associated Packages ``` @@ -112,7 +110,7 @@ Associated Packages - Meta-packages and associated packages are represented in the same color. ```{note} -{numref}`assoc-packages` is for informational purposes only, as the individual +The preceding image is for informational purposes only, as the individual packages in a meta-package are subject to change. Install meta-packages, and not individual packages, to avoid conflicts. ``` diff --git a/docs/deploy/linux/os-native/uninstall.md b/docs/tutorials/install/linux/os-native/uninstall.md similarity index 97% rename from docs/deploy/linux/os-native/uninstall.md rename to docs/tutorials/install/linux/os-native/uninstall.md index 2764baf39..0924f371f 100644 --- a/docs/deploy/linux/os-native/uninstall.md +++ b/docs/tutorials/install/linux/os-native/uninstall.md @@ -3,7 +3,7 @@ This section describes how to uninstall ROCm with the Linux distribution's package manager. This method should be used if ROCm was installed via the package manager. If the installer script was used for installation, then it should be -used for uninstallation too, refer to {doc}`/deploy/linux/installer/uninstall`. +used for uninstallation too, refer to {doc}`../installer/uninstall`. ::::::{tab-set} :::::{tab-item} Ubuntu diff --git a/docs/deploy/linux/os-native/upgrade.md b/docs/tutorials/install/linux/os-native/upgrade.md similarity index 100% rename from docs/deploy/linux/os-native/upgrade.md rename to docs/tutorials/install/linux/os-native/upgrade.md diff --git a/docs/deploy/linux/prerequisites.md b/docs/tutorials/install/linux/prerequisites.md similarity index 97% rename from docs/deploy/linux/prerequisites.md rename to docs/tutorials/install/linux/prerequisites.md index 52aea8a8d..606f7c7ee 100644 --- a/docs/deploy/linux/prerequisites.md +++ b/docs/tutorials/install/linux/prerequisites.md @@ -24,7 +24,7 @@ Verify the Linux distribution using the following steps: uname -m && cat /etc/*release ``` -2. Confirm that the obtained Linux distribution information matches with those listed in {ref}`supported_distributions`. +2. Confirm that the obtained Linux distribution information matches with those listed in {ref}`linux_support`. **Example:** Running the command above on an Ubuntu system results in the following output: @@ -57,7 +57,7 @@ Verify the kernel version using the following steps: ``` 2. Confirm that the obtained kernel version information matches with system - requirements as listed in {ref}`supported_distributions`. + requirements as listed in {ref}`linux_support`. ## Additional package repositories @@ -181,6 +181,8 @@ sudo zypper install kernel-default-devel ::: :::: +(linux_group_permissions)= + ## Setting Permissions for Groups This section provides steps to add any current user to a video group to access diff --git a/docs/how_to/magma_install/magma_install.md b/docs/tutorials/install/magma_install.md similarity index 100% rename from docs/how_to/magma_install/magma_install.md rename to docs/tutorials/install/magma_install.md diff --git a/docs/how_to/pytorch_install/pytorch_install.md b/docs/tutorials/install/pytorch_install.md similarity index 95% rename from docs/how_to/pytorch_install/pytorch_install.md rename to docs/tutorials/install/pytorch_install.md index 512e47698..ff505b39e 100644 --- a/docs/how_to/pytorch_install/pytorch_install.md +++ b/docs/tutorials/install/pytorch_install.md @@ -2,7 +2,7 @@ ## PyTorch -PyTorch is an open source Machine Learning Python library, primarily +PyTorch is an open-source machine learning Python library, primarily differentiated by Tensor computing with GPU acceleration and a type-based automatic differentiation. Other advanced features include: @@ -15,8 +15,8 @@ automatic differentiation. Other advanced features include: ### Installing PyTorch To install ROCm on bare metal, refer to the sections -[GPU and OS Support (Linux)](../../release/gpu_os_support.md) and -[Compatibility](../../release/compatibility.md) for hardware, software and +[GPU and OS Support (Linux)](../../about/release/linux_support) and +[Compatibility](../../about/compatibility/index) for hardware, software and 3rd-party framework compatibility between ROCm and PyTorch. The recommended option to get a PyTorch environment is through Docker. However, installing the PyTorch wheels package on bare metal is also supported. @@ -60,13 +60,12 @@ Follow these steps: PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) -and choose the "ROCm" compute platform. {numref}`Installation-Matrix-from-Pytorch` is a matrix from that illustrates the installation compatibility between ROCm and the PyTorch build. +and choose the "ROCm" compute platform. The following image is a matrix from that illustrates the installation compatibility between ROCm and the PyTorch build. + +```{figure} ../../data/tutorials/install/magma_install/magma006.png +:name: installation-matrix-pytorch +:align: center -```{figure} ../../data/how_to/magma_install/image.006.png -:name: Installation-Matrix-from-Pytorch ---- -align: center ---- Installation Matrix from Pytorch ``` @@ -80,8 +79,7 @@ To install PyTorch using the wheels package, follow these installation steps: or b. Download a base OS Docker image and install ROCm following the - installation directions in the section - [Installation](../../deploy/linux/install.md). ROCm 5.2 is installed in + installation directions in the [Installation](../../tutorials/install/linux/index) section. ROCm 5.2 is installed in this example, as supported by the installation matrix from . diff --git a/docs/how_to/tensorflow_install/tensorflow_install.md b/docs/tutorials/install/tensorflow_install.md similarity index 97% rename from docs/how_to/tensorflow_install/tensorflow_install.md rename to docs/tutorials/install/tensorflow_install.md index df0d12547..b30c46f2a 100644 --- a/docs/how_to/tensorflow_install/tensorflow_install.md +++ b/docs/tutorials/install/tensorflow_install.md @@ -2,7 +2,7 @@ ## TensorFlow -TensorFlow is an open source library for solving Machine Learning, +TensorFlow is an open-source library for solving Machine Learning, Deep Learning, and Artificial Intelligence problems. It can be used to solve many problems across different sectors and industries but primarily focuses on training and inference in neural networks. It is one of the most popular and @@ -16,7 +16,7 @@ The following sections contain options for installing TensorFlow. #### Option 1: Install TensorFlow Using Docker Image To install ROCm on bare metal, follow the section -[Installation (Linux)](../../deploy/linux/install.md). The recommended option to +[Installation (Linux)](../../tutorials/install/linux/os-native/install). The recommended option to get a TensorFlow environment is through Docker. Using Docker provides portability and access to a prebuilt Docker container that diff --git a/docs/deploy/windows/cli/index.md b/docs/tutorials/install/windows/cli/index.md similarity index 88% rename from docs/deploy/windows/cli/index.md rename to docs/tutorials/install/windows/cli/index.md index f351c36de..27aab41b1 100644 --- a/docs/deploy/windows/cli/index.md +++ b/docs/tutorials/install/windows/cli/index.md @@ -28,4 +28,4 @@ Steps for removing ROCm packages and libraries. ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../../about/release/windows_support` diff --git a/docs/deploy/windows/cli/install.md b/docs/tutorials/install/windows/cli/install.md similarity index 97% rename from docs/deploy/windows/cli/install.md rename to docs/tutorials/install/windows/cli/install.md index 88f677ca9..2a4f6deae 100644 --- a/docs/deploy/windows/cli/install.md +++ b/docs/tutorials/install/windows/cli/install.md @@ -13,11 +13,10 @@ compatible GPU is required. Please see the supported GPU guide for more details. The command line installer is the same executable which is used by the graphical front-end. Download the installer from the [HIP-SDK download page](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html). -The options supported by the command line interface are summarized in -{numref}`hip-sdk-cli-options`. +The options supported by the command line interface are summarized in the following table. ```{table} HIP SDK Command Line Options -:name: hip-sdk-cli-options +:name: hip-sdk-cli-install | **Install Option** | **Description** | |:------------------:|:---------------:| | `-install` | Command used to install packages, both driver and applications. No output to the screen. | diff --git a/docs/deploy/windows/cli/uninstall.md b/docs/tutorials/install/windows/cli/uninstall.md similarity index 97% rename from docs/deploy/windows/cli/uninstall.md rename to docs/tutorials/install/windows/cli/uninstall.md index 0af305704..8b07933f3 100644 --- a/docs/deploy/windows/cli/uninstall.md +++ b/docs/tutorials/install/windows/cli/uninstall.md @@ -6,10 +6,10 @@ The steps to uninstall the HIP SDK for Windows are described in this document. The command line installer is the same executable which is used by the graphical front-end. The options supported by the command line interface are summarized in -{numref}`hip-sdk-cli-options`. +the following table. ```{table} HIP SDK Command Line Options -:name: hip-sdk-cli-options +:name: hip-sdk-cli-uninstall | **Install Option** | **Description** | |:------------------:|:---------------:| | `-install` | Command used to install packages, both driver and applications. No output to the screen. | diff --git a/docs/deploy/windows/cli/upgrade.md b/docs/tutorials/install/windows/cli/upgrade.md similarity index 76% rename from docs/deploy/windows/cli/upgrade.md rename to docs/tutorials/install/windows/cli/upgrade.md index e5cedcb7e..9a1a6e9d0 100644 --- a/docs/deploy/windows/cli/upgrade.md +++ b/docs/tutorials/install/windows/cli/upgrade.md @@ -6,8 +6,8 @@ The steps to uninstall the HIP SDK for Windows are described in this document. To upgrade an existing installation of the HIP SDK without preserving the previous version, first uninstall it, then install the new version following the -instructions in {doc}`/deploy/windows/cli/uninstall` and -{doc}`/deploy/windows/cli/install` using the old and new installers +instructions in {doc}`./uninstall` and +{doc}`./install` using the old and new installers respectively. To upgrade by installing both versions side-by-side, just run the installer of diff --git a/docs/deploy/windows/gui/index.md b/docs/tutorials/install/windows/gui/index.md similarity index 88% rename from docs/deploy/windows/gui/index.md rename to docs/tutorials/install/windows/gui/index.md index 55a3113f9..da0c73dfa 100644 --- a/docs/deploy/windows/gui/index.md +++ b/docs/tutorials/install/windows/gui/index.md @@ -28,4 +28,4 @@ Steps for removing ROCm packages and libraries. ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../../about/release/windows_support` diff --git a/docs/deploy/windows/gui/install.md b/docs/tutorials/install/windows/gui/install.md similarity index 79% rename from docs/deploy/windows/gui/install.md rename to docs/tutorials/install/windows/gui/install.md index 18b8b6727..6abe67cd0 100644 --- a/docs/deploy/windows/gui/install.md +++ b/docs/tutorials/install/windows/gui/install.md @@ -17,11 +17,10 @@ Download the installer from the ### Launching the installer -To launch the AMD HIP SDK Installer, click the **Setup** icon shown in -{numref}`setup-icon`. +To launch the AMD HIP SDK Installer, click the **Setup** icon shown in the following image. -```{figure} /data/deploy/windows/000-setup-icon.png -:name: setup-icon +```{figure} ../../../../data/tutorials/install/windows/000-setup-icon.png +:name: setup-icon-install :alt: Icon with AMD arrow logo and User Access Control Shield overlayed. Setup Icon ``` @@ -29,15 +28,15 @@ Setup Icon The installer requires Administrator Privileges, so you may be greeted with a User Access Control (UAC) pop-up. Click Yes. -```{figure} /data/deploy/windows/001-uac-dark.png -:name: uac-dark +```{figure} ../../../../data/tutorials/install/windows/001-uac-dark.png +:name: uac-dark-install :class: only-dark :alt: User Access Control pop-up User Access Control pop-up ``` -```{figure} /data/deploy/windows/001-uac-light.png -:name: uac-light +```{figure} ../../../../data/tutorials/install/windows/001-uac-light.png +:name: uac-light-install :class: only-light :alt: User Access Control pop-up User Access Control pop-up @@ -45,20 +44,19 @@ User Access Control pop-up The installer executable will temporarily extract installer packages to `C:\AMD` which it will remove after installation completes. This extraction is signified -by the "Initializing install" window in {numref}`init-install`. +by the "Initializing install" window in the following image. -```{figure} /data/deploy/windows/002-initializing.png +```{figure} ../../../../data/tutorials/install/windows/002-initializing.png :name: init-install :alt: Window with AMD arrow logo, futuristic background and progress counter. Installer initialization window ``` -The installer will then detect your system configuration as per -{numref}`detecting-system-components` to decide, which installable components +The installer will then detect your system configuration to determine which installable components are applicable to your system. -```{figure} /data/deploy/windows/003-detecting-system-config.png -:name: detecting-system-components +```{figure} ../../../../data/tutorials/install/windows/003-detecting-system-config.png +:name: detect-sys-components :alt: Window with AMD arrow logo, futuristic background and activity indicator. Installer initialization window. ``` @@ -67,10 +65,10 @@ Installer initialization window. When the installer launches, it displays a window that lets the user customize the installation. By default, all components are selected for installation. -Refer to {numref}`installer-window` for an instance when the Select All option +Refer to the following image for an instance when the Select All option is turned on. -```{figure} /data/deploy/windows/004-installer-window.png +```{figure} ../../../../data/tutorials/install/windows/004-installer-window.png :name: installer-window :alt: Window with AMD arrow logo, futuristic background and activity indicator. Installer initialization window. @@ -78,7 +76,7 @@ Installer initialization window. #### HIP SDK Installer -The HIP SDK installation options are listed in {numref}`hip-sdk-options`. +The HIP SDK installation options are listed in the following table. ```{table} HIP SDK Components for Installation :name: hip-sdk-options @@ -106,15 +104,14 @@ convenient. #### AMD Display Driver The HIP SDK installer bundles an AMD Radeon Software PRO 23.10 installer. The -supported install options are summarized by -{numref}`display-driver-install-options`: +supported install options are summarized in the following table: ```{table} AMD Display Driver Install Options :name: display-driver-install-options | **Install Option** | **Description** | |:------------------:|:---------------:| | Install Location | Location on disk to store driver files. | -| Install Type | The breadth of components to be installed. Refer to {numref}`display-driver-install-types` for details. | +| Install Type | The breadth of components to be installed. | | Factory Reset (Optional) | A Factory Reset will remove all prior versions of AMD HIP SDK and drivers. You will not be able to roll back to previously installed drivers. | ``` @@ -134,10 +131,9 @@ Display Driver. ### Installing Components -Please wait for the installation to complete during as shown in -{numref}`install-progress`. +Please wait for the installation to complete during as shown in the following image. -```{figure} /data/deploy/windows/012-install-progress.png +```{figure} ../../../../data/tutorials/install/windows/012-install-progress.png :name: install-progress :alt: Window with AMD arrow logo, futuristic background and progress meter. Installation Progress @@ -146,10 +142,9 @@ Installation Progress ### Installation Complete Once the installation is complete, the installer window may prompt you for a -system restart. Click **Restart** at the lower right corner, shown in -{numref}`install-complete` +system restart. Click **Restart** at the lower right corner, shown in the following image. -```{figure} /data/deploy/windows/013-install-complete.png +```{figure} ../../../../data/tutorials/install/windows/013-install-complete.png :name: install-complete :alt: Window with AMD arrow logo, futuristic background and completion notice. Installation Complete diff --git a/docs/deploy/windows/gui/uninstall.md b/docs/tutorials/install/windows/gui/uninstall.md similarity index 85% rename from docs/deploy/windows/gui/uninstall.md rename to docs/tutorials/install/windows/gui/uninstall.md index e557d02ae..954801979 100644 --- a/docs/deploy/windows/gui/uninstall.md +++ b/docs/tutorials/install/windows/gui/uninstall.md @@ -12,14 +12,14 @@ Uninstallation of the HIP SDK components can be done through the Windows Settings app. Navigate to "Apps > Installed apps", click the "..." on the far right next to the component to uninstall, and click "Uninstall". -```{figure} /data/deploy/windows/014-uninstall-dark.png +```{figure} ../../../../data/tutorials/install/windows/014-uninstall-dark.png :name: uninstall-dark :class: only-dark :alt: Installed apps section of the Setting app showing installed HIP SDK components. Removing the SDK via the Setting app ``` -```{figure} /data/deploy/windows/014-uninstall-light.png +```{figure} ../../../../data/tutorials/install/windows/014-uninstall-light.png :name: uninstall-light :class: only-light :alt: Installed apps section of the Setting app showing installed HIP SDK components. diff --git a/docs/deploy/windows/gui/upgrade.md b/docs/tutorials/install/windows/gui/upgrade.md similarity index 100% rename from docs/deploy/windows/gui/upgrade.md rename to docs/tutorials/install/windows/gui/upgrade.md diff --git a/docs/deploy/windows/index.md b/docs/tutorials/install/windows/index.md similarity index 85% rename from docs/deploy/windows/index.md rename to docs/tutorials/install/windows/index.md index 3268bfa2e..49c3de9b3 100644 --- a/docs/deploy/windows/index.md +++ b/docs/tutorials/install/windows/index.md @@ -1,6 +1,6 @@ # Install ROCm (HIP SDK) on Windows -Start with {doc}`/deploy/windows/quick_start` or follow the detailed +Start with {doc}`../../quick_start/windows` or follow the detailed instructions below. ## Prepare to Install @@ -52,7 +52,7 @@ Learn how to use ROCm with descriptive examples for novice to intermediate users ::: :::{grid-item-card} Windows App Deployment Guidelines -:link: ../../understand/windows-app-deployment-guidelines +:link: ../../../conceptual/windows-app-deployment-guidelines :link-type: doc Discusses strategies on how to bundle HIP libraries with an end user application. @@ -62,4 +62,4 @@ Discusses strategies on how to bundle HIP libraries with an end user application ## See Also -- {doc}`/release/gpu_os_support` +- {doc}`../../../about/release/windows_support` diff --git a/docs/deploy/windows/prerequisites.md b/docs/tutorials/install/windows/prerequisites.md similarity index 86% rename from docs/deploy/windows/prerequisites.md rename to docs/tutorials/install/windows/prerequisites.md index fc17f5c2c..3140668de 100644 --- a/docs/deploy/windows/prerequisites.md +++ b/docs/tutorials/install/windows/prerequisites.md @@ -40,14 +40,14 @@ Verify the Windows Edition using the following steps: 1. Open the Setting app. - ```{figure} /data/deploy/windows/000-settings-dark.png + ```{figure} ../../../data/tutorials/install/windows/000-settings-dark.png :name: settings-dark :class: only-dark :alt: Gear icon of the Windows Settings app Windows Settings app icon ``` - ```{figure} /data/deploy/windows/000-settings-light.png + ```{figure} ../../../data/tutorials/install/windows/000-settings-light.png :name: settings-light :class: only-light :alt: Gear icon of the Windows Settings app @@ -56,14 +56,14 @@ Verify the Windows Edition using the following steps: 2. Navigate to **System > About**. - ```{figure} /data/deploy/windows/001-about-dark.png + ```{figure} ../../../data/tutorials/install/windows/001-about-dark.png :name: about-dark :class: only-dark :alt: Settings app panel showing Device and OS information Settings > About page ``` - ```{figure} /data/deploy/windows/001-about-light.png + ```{figure} ../../../data/tutorials/install/windows/001-about-light.png :name: about-light :class: only-light :alt: Settings app panel showing Device and OS information diff --git a/docs/tutorials/quick_start/index.md b/docs/tutorials/quick_start/index.md new file mode 100644 index 000000000..9cd20a3ec --- /dev/null +++ b/docs/tutorials/quick_start/index.md @@ -0,0 +1,16 @@ +# ROCm quick-start guides + +Quick-start guides are designed to get you up and running quickly. They do not go into detail or help +with troubleshooting. For more in-depth installation guides, see [Installing ROCm](../install/index.md). + +:::::{grid} 1 1 2 2 +:gutter: 1 + +:::{grid-item-card} [Linux quick-start guide](linux.md) +::: + +:::{grid-item-card} [Windows quick-start guide](windows.md) +:link-type: url +::: + +::::: diff --git a/docs/deploy/linux/quick_start.md b/docs/tutorials/quick_start/linux.md similarity index 100% rename from docs/deploy/linux/quick_start.md rename to docs/tutorials/quick_start/linux.md diff --git a/docs/deploy/windows/quick_start.md b/docs/tutorials/quick_start/windows.md similarity index 78% rename from docs/deploy/windows/quick_start.md rename to docs/tutorials/quick_start/windows.md index cfe8ba312..8c6b87ec7 100644 --- a/docs/deploy/windows/quick_start.md +++ b/docs/tutorials/quick_start/windows.md @@ -17,10 +17,9 @@ Download the installer from the ### Launching the installer -To launch the AMD HIP SDK Installer, click the **Setup** icon shown in -{numref}`setup-icon`. +To launch the AMD HIP SDK Installer, click the **Setup** icon shown in the following image. -```{figure} /data/deploy/windows/000-setup-icon.png +```{figure} ../../data/tutorials/install/windows/000-setup-icon.png :name: setup-icon :alt: Icon with AMD arrow logo and User Access Control Shield overlayed. Setup Icon @@ -29,14 +28,14 @@ Setup Icon The installer requires Administrator Privileges, so you may be greeted with a User Access Control (UAC) pop-up. Click Yes. -```{figure} /data/deploy/windows/001-uac-dark.png +```{figure} ../../data/tutorials/install/windows/001-uac-dark.png :name: uac-dark :class: only-dark :alt: User Access Control pop-up User Access Control pop-up ``` -```{figure} /data/deploy/windows/001-uac-light.png +```{figure} ../../data/tutorials/install/windows/001-uac-light.png :name: uac-light :class: only-light :alt: User Access Control pop-up @@ -45,19 +44,18 @@ User Access Control pop-up The installer executable will temporarily extract installer packages to `C:\AMD` which it will remove after installation completes. This extraction is signified -by the "Initializing install" window in {numref}`init-install`. +by the "Initializing install" window in the following image. -```{figure} /data/deploy/windows/002-initializing.png -:name: init-install +```{figure} ../../data/tutorials/install/windows/002-initializing.png +:name: init-win-install :alt: Window with AMD arrow logo, futuristic background and progress counter. Installer initialization window ``` -The installer will then detect your system configuration as per -{numref}`detecting-system-components` to decide, which installable components +The installer will then detect your system configuration to determine which installable components are applicable to your system. -```{figure} /data/deploy/windows/003-detecting-system-config.png +```{figure} ../../data/tutorials/install/windows/003-detecting-system-config.png :name: detecting-system-components :alt: Window with AMD arrow logo, futuristic background and activity indicator. Installer initialization window. @@ -67,21 +65,21 @@ Installer initialization window. When the installer launches, it displays a window that lets the user customize the installation. By default, all components are selected for installation. -Refer to {numref}`installer-window` for an instance when the Select All option +Refer to the following image for an instance when the Select All option is turned on. -```{figure} /data/deploy/windows/004-installer-window.png -:name: installer-window +```{figure} ../../data/tutorials/install/windows/004-installer-window.png +:name: install-window :alt: Window with AMD arrow logo, futuristic background and activity indicator. Installer initialization window. ``` #### HIP SDK Installer -The HIP SDK installation options are listed in {numref}`hip-sdk-options`. +The HIP SDK installation options are listed in the following table. ```{table} HIP SDK Components for Installation -:name: hip-sdk-options +:name: hip-sdk-options-win | **HIP Components** | **Install Type** | **Additional Options** | |:------------------:|:----------------:|:----------------------:| | HIP SDK Core | 5.5.0 | Install location | @@ -106,20 +104,19 @@ convenient. #### AMD Display Driver The HIP SDK installer bundles an AMD Radeon Software PRO 23.10 installer. The -supported install options are summarized by -{numref}`display-driver-install-options`: +supported install options are summarized in the following table: ```{table} AMD Display Driver Install Options -:name: display-driver-install-options +:name: display-driver-install-win | **Install Option** | **Description** | |:------------------:|:---------------:| | Install Location | Location on disk to store driver files. | -| Install Type | The breadth of components to be installed. Refer to {numref}`display-driver-install-types` for details. | +| Install Type | The breadth of components to be installed. | | Factory Reset (Optional) | A Factory Reset will remove all prior versions of AMD HIP SDK and drivers. You will not be able to roll back to previously installed drivers. | ``` ```{table} AMD Display Driver Install Types -:name: display-driver-install-types +:name: display-driver-win-types | **Install Type** | **Description** | |:----------------:|:---------------:| | Full Install | Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking the performance on your graphics hardware. | @@ -134,11 +131,10 @@ Display Driver. ### Installing Components -Please wait for the installation to complete during as shown in -{numref}`install-progress`. +Please wait for the installation to complete during as shown in the following image. -```{figure} /data/deploy/windows/012-install-progress.png -:name: install-progress +```{figure} ../../data/tutorials/install/windows/012-install-progress.png +:name: install-progress-win :alt: Window with AMD arrow logo, futuristic background and progress meter. Installation Progress ``` @@ -146,11 +142,10 @@ Installation Progress ### Installation Complete Once the installation is complete, the installer window may prompt you for a -system restart. Click **Restart** at the lower right corner, shown in -{numref}`install-complete` +system restart. Click **Restart** at the lower right corner, shown in the following image. -```{figure} /data/deploy/windows/013-install-complete.png -:name: install-complete +```{figure} ../../data/tutorials/install/windows/013-install-complete.png +:name: install-complete-win :alt: Window with AMD arrow logo, futuristic background and completion notice. Installation Complete ``` @@ -172,15 +167,15 @@ Uninstallation of the HIP SDK components can be done through the Windows Settings app. Navigate to "Apps > Installed apps", click the "..." on the far right next to the component to uninstall, and click "Uninstall". -```{figure} /data/deploy/windows/014-uninstall-dark.png -:name: uninstall-dark +```{figure} ../../data/tutorials/install/windows/014-uninstall-dark.png +:name: uninstall-dark-win :class: only-dark :alt: Installed apps section of the Setting app showing installed HIP SDK components. Removing the SDK via the Setting app ``` -```{figure} /data/deploy/windows/014-uninstall-light.png -:name: uninstall-light +```{figure} ../../data/tutorials/install/windows/014-uninstall-light.png +:name: uninstall-light-win :class: only-light :alt: Installed apps section of the Setting app showing installed HIP SDK components. Removing the SDK via the Setting app diff --git a/docs/what_is_rocm.md b/docs/what_is_rocm.md new file mode 100644 index 000000000..47fa78657 --- /dev/null +++ b/docs/what_is_rocm.md @@ -0,0 +1,20 @@ +# What is ROCm? + +ROCm is an open-source stack, composed primarily of open-source software, designed for +graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development +tools, and APIs that enable GPU programming from low-level kernel to end-user applications. + +With ROCm, you can customize your GPU software to meet your specific needs. You can develop, +collaborate, test, and deploy your applications in a free, open source, integrated, and secure software +ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC), +artificial intelligence (AI), scientific computing, and computer aided design (CAD). + +ROCm is powered by AMD’s +[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP), +an open-source software C++ GPU programming environment and its corresponding runtime. HIP +allows ROCm developers to create portable applications on different platforms by deploying code on a +range of platforms, from dedicated gaming GPUs to exascale HPC clusters. + +ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open +source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning +(ML) frameworks, such as PyTorch and TensorFlow. diff --git a/docs/whats_new/rocm_on_windows.md b/docs/whats_new/rocm_on_windows.md new file mode 100644 index 000000000..ab251307c --- /dev/null +++ b/docs/whats_new/rocm_on_windows.md @@ -0,0 +1,89 @@ +# ROCm on Windows + +Starting with ROCm 5.5, the HIP SDK brings a subset of ROCm to developers on Windows. +The collection of features enabled on Windows is referred to as the HIP SDK. +These features allow developers to use the HIP runtime, HIP math libraries +and HIP Primitive libraries. The following table shows the differences +between Windows and Linux releases. + +|Component|Linux|Windows| +|---------|-----|-------| +|Driver|Radeon Software for Linux |AMD Software Pro Edition| +|Compiler|`hipcc`/`amdclang++`|`hipcc`/`clang++`| +|Debugger|`rocgdb`|no debugger available| +|Profiler|`rocprof`|[Radeon GPU Profiler](https://gpuopen.com/rgp/)| +|Porting Tools|HIPIFY|Coming Soon| +|Runtime|HIP (Open Sourced)|HIP (closed source)| +|Math Libraries|Supported|Supported| +|Primitives Libraries|Supported|Supported| +|Communication Libraries|Supported|Not Available| +|AI Libraries|MIOpen, MIGraphX|Not Available| +|System Management|`rocm-smi-lib`, RDC, `rocminfo`|`amdsmi`, `hipInfo`| +|AI Frameworks|PyTorch, TensorFlow, etc.|Not Available| +|CMake HIP Language|Enabled|Unsupported| +|Visual Studio| Not applicable| Plugin Available| +|HIP Ray Tracing| Supported|Supported| + +AMD is continuing to invest in Windows support and AMD plans to release enhanced +features in subsequent revisions. + +```{note} +The 5.5 Windows Installer collectively groups the Math and Primitives +libraries. +``` + +```{note} +GPU support on Windows and Linux may differ. You must refer to +Windows and Linux GPU support tables separately. +``` + +```{note} +HIP Ray Tracing is not distributed via ROCm in Linux. +``` + +## ROCm release versioning + +Linux OS releases set the canonical version numbers for ROCm. Windows will +follow Linux version numbers as Windows releases are based on Linux ROCm +releases. However, not all Linux ROCm releases will have a corresponding Windows +release. The following table shows the ROCm releases on Windows and Linux. Releases +with both Windows and Linux are referred to as a joint release. Releases with +only Linux support are referred to as a skipped release from the Windows +perspective. + +|Release version|Linux|Windows| +|---------------|-----|-------| +|5.5|✅|✅| +|5.6|✅|❌| + +ROCm Linux releases are versioned with following the Major.Minor.Patch +version number system. Windows releases will only be versioned with Major.Minor. + +In general, Windows releases will trail Linux releases. Software developers that +wish to support both Linux and Windows using a single ROCm version should +refrain from upgrading ROCm unless there is a joint release. + +## Windows Documentation implications + +The ROCm documentation website contains both Windows and Linux documentation. +Just below each article title, a convenient article information section states +whether the page applies to Linux only, Windows only or both OSes. To find the +exact Windows documentation for a release of the HIP SDK, please view the ROCm documentation with the same +Major.Minor version number while ignoring the Patch version. The Patch version +only matters for Linux releases. For convenience, +Windows documentation will continue to be included in the overall ROCm +documentation for the skipped Windows releases. + +Windows release notes will contain only information pertinent to Windows. +The software developer must read all the previous ROCm release notes (including) +skipped ROCm versions on Windows for information on all the changes present in +the Windows release. + +## Windows Builds from Source + +Not all source code required to build Windows from source is available under a +permissive open source license. Build instructions on Windows is only provided +for projects that can be built from source on Windows using a toolchain that +has closed source build prerequisites. The ROCm manifest file is not valid for +Windows. AMD does not release a manifest or tag our components in Windows. +Users may use corresponding Linux tags to build on Windows. diff --git a/docs/rocm.md b/docs/whats_new/whats_new.md similarity index 73% rename from docs/rocm.md rename to docs/whats_new/whats_new.md index 454e4ee4b..66a0df98b 100644 --- a/docs/rocm.md +++ b/docs/whats_new/whats_new.md @@ -1,25 +1,8 @@ -# What is ROCm? +# What's new in ROCm? -ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for -graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development -tools, and APIs that enable GPU programming from low-level kernel to end-user applications. +ROCm is now supported on Windows. -With ROCm, you can customize your GPU software to meet your specific needs. You can develop, -collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software -ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC), -artificial intelligence (AI), scientific computing, and computer aided design (CAD). - -ROCm is powered by AMD’s -[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP), -an OSS C++ GPU programming environment and its corresponding runtime. HIP allows ROCm -developers to create portable applications on different platforms by deploying code on a range of -platforms, from dedicated gaming GPUs to exascale HPC clusters. - -ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary OSS -compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks, -such as PyTorch and TensorFlow. - -## ROCm on Windows +## Windows support Starting with ROCm 5.5, the HIP SDK brings a subset of ROCm to developers on Windows. The collection of features enabled on Windows is referred to as the HIP SDK. @@ -62,7 +45,7 @@ Windows and Linux GPU support tables separately. HIP Ray Tracing is not distributed via ROCm in Linux. ``` -### ROCm release versioning +## ROCm release versioning Linux OS releases set the canonical version numbers for ROCm. Windows will follow Linux version numbers as Windows releases are based on Linux ROCm @@ -84,7 +67,7 @@ In general, Windows releases will trail Linux releases. Software developers that wish to support both Linux and Windows using a single ROCm version should refrain from upgrading ROCm unless there is a joint release. -### Windows Documentation implications +## Windows Documentation implications The ROCm documentation website contains both Windows and Linux documentation. Just below each article title, a convenient article information section states @@ -100,7 +83,7 @@ The software developer must read all the previous ROCm release notes (including) skipped ROCm versions on Windows for information on all the changes present in the Windows release. -### Windows Builds from Source +## Windows Builds from Source Not all source code required to build Windows from source is available under a permissive open source license. Build instructions on Windows is only provided diff --git a/tools/autotag/templates/rocm_changes/5.2.0.md b/tools/autotag/templates/rocm_changes/5.2.0.md index fd72bfb13..838c9961e 100644 --- a/tools/autotag/templates/rocm_changes/5.2.0.md +++ b/tools/autotag/templates/rocm_changes/5.2.0.md @@ -289,7 +289,7 @@ This release introduces a new ROCm C++ library for accelerating mixed precision rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed. For more information, refer to -[Communication Libraries](../../../../docs/reference/gpu_libraries/communication.md). +[Communication Libraries](../../../../docs/reference/libraries/gpu_libraries/communication.md). #### OpenMP Enhancements in This Release