ROCm A-Z page & link cleanup (#2450)
2
.github/CODEOWNERS
vendored
@@ -1 +1 @@
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus @LisaDelaney
|
||||
|
||||
3
.gitignore
vendored
@@ -13,6 +13,7 @@ _doxygen/
|
||||
_readthedocs/
|
||||
|
||||
# avoid duplicating contributing.md due to conf.py
|
||||
CHANGELOG.md
|
||||
docs/contributing.md
|
||||
docs/release.md
|
||||
docs/CHANGELOG.md
|
||||
docs/about/release-notes.md
|
||||
@@ -1,5 +1,7 @@
|
||||
config:
|
||||
default: true
|
||||
MD004:
|
||||
style: asterisk
|
||||
MD013: false
|
||||
MD026:
|
||||
punctuation: '.,;:!'
|
||||
@@ -11,6 +13,6 @@ config:
|
||||
MD051: false
|
||||
ignores:
|
||||
- CHANGELOG.md
|
||||
- docs/CHANGELOG.md
|
||||
- "{,docs/}{RELEASE,release}.md"
|
||||
- docs/about/release/release_notes.md
|
||||
- tools/autotag/templates/**/*.md
|
||||
|
||||
@@ -1,3 +1,8 @@
|
||||
GEMM
|
||||
autogenerated
|
||||
cuFFT
|
||||
NVCC
|
||||
CPP
|
||||
ROCProfiler
|
||||
ROCTracer
|
||||
ROCdbgapi
|
||||
|
||||
1803
CHANGELOG.md
@@ -35,16 +35,16 @@ guide on writing and formatting on GitHub as a starting point.
|
||||
ROCm documentation adds additional requirements to Markdown and RST based files
|
||||
as follows:
|
||||
|
||||
- Level one headers are only used for page titles. There must be only one level
|
||||
* Level one headers are only used for page titles. There must be only one level
|
||||
1 header per file for both Markdown and Restructured Text.
|
||||
- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
|
||||
* Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
|
||||
our automated GitHub action on a Pull Request (PR).
|
||||
See the {doc}`rocm-docs-core linting user guide <rocm-docs-core:user_guide/linting>` for more details.
|
||||
|
||||
## Filenames and folder structure
|
||||
|
||||
Please use snake case (all lower case letters and underscores instead of spaces)
|
||||
for file names. For example, `example_file_name.md`.
|
||||
Please use kebab-case (all lower case letters and dashes instead of spaces)
|
||||
for file names. For example, `example-file-name.md`.
|
||||
Our documentation follows Pitchfork for folder structure.
|
||||
All documentation is in `/docs` except for special files like
|
||||
the contributing guide in the `/` folder. All images used in the documentation are
|
||||
|
||||
@@ -27,7 +27,7 @@ ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
### Fixed Defects
|
||||
|
||||
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
- Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
* *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
* Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
* Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
* Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
|
||||
5129
docs/CHANGELOG.md
Normal file
@@ -4,7 +4,7 @@ ROCm™ supports various 3rd party libraries and frameworks. Supported versions
|
||||
are tested and known to work. Non-supported versions of 3rd parties may also
|
||||
work, but aren't tested.
|
||||
|
||||
## Deep Learning
|
||||
## Deep learning
|
||||
|
||||
ROCm releases support the most recent and two prior releases of PyTorch and
|
||||
TensorFlow
|
||||
@@ -19,7 +19,7 @@ TensorFlow
|
||||
| 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11 | 2.5.4 |
|
||||
| 5.6 | 1.11, 1.12.1, 1.13.1 | 2.12 | 2.5.4 |
|
||||
|
||||
(communication_libraries)=
|
||||
(communication-libraries)=
|
||||
|
||||
## Communication libraries
|
||||
|
||||
@@ -45,8 +45,8 @@ UCC version | ROCm 5.5 and older | ROCm 5.6 and newer |
|
||||
ROCm releases provide algorithm libraries with interfaces compatible with
|
||||
contemporary CUDA / NVIDIA HPC SDK alternatives.
|
||||
|
||||
- Thrust → rocThrust
|
||||
- CUB → hipCUB
|
||||
* Thrust → rocThrust
|
||||
* CUB → hipCUB
|
||||
|
||||
| ROCm | Thrust / CUB | HPC SDK |
|
||||
|:------|:------------:|:-------:|
|
||||
@@ -59,4 +59,4 @@ contemporary CUDA / NVIDIA HPC SDK alternatives.
|
||||
| 5.6 | 1.17.2 | 22.9 |
|
||||
|
||||
For the latest documentation of these libraries, refer to the
|
||||
[associated documentation](../../reference/libraries/gpu_libraries/c++_primitives).
|
||||
[associated documentation](../../reference/libraries/gpu-libraries/c++primitives).
|
||||
@@ -3,27 +3,27 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} User space & Kernel Fusion Driver
|
||||
:::{grid-item-card}
|
||||
**[User space & kernel fusion driver](./user-kernel-space-compat-matrix.md)**
|
||||
|
||||
Forward and backward compatibility of ROCm user space components and the
|
||||
kernel space Kernel Fusion Driver (KFD).
|
||||
|
||||
- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Docker Image Support
|
||||
:::{grid-item-card}
|
||||
**[Docker image support](./docker-image-support-matrix.md)**
|
||||
|
||||
ROCm releases several Docker container images.
|
||||
|
||||
- [Docker Image Support Matrix](./docker_image_support_matrix)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} 3rd Party Support
|
||||
:::{grid-item-card}
|
||||
**[Third-party support](./3rd-party-support-matrix.md)**
|
||||
|
||||
Several 3rd party libraries ship with ROCm enablement as well as several ROCm
|
||||
components provide interfaces compatible with 3rd party solutions.
|
||||
|
||||
- [Third party support matrix](./3rd_party_support_matrix)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# GPU and OS support (Linux)
|
||||
|
||||
(linux_support)=
|
||||
(linux-support)=
|
||||
|
||||
## Supported Linux Distributions
|
||||
|
||||
@@ -23,8 +23,8 @@ AMD ROCm™ Platform supports the following Linux distributions.
|
||||
|
||||
:::{versionadded} 5.6
|
||||
|
||||
- RHEL 8.8 and 9.2 support is added.
|
||||
- SLES 15 SP5 support is added
|
||||
* RHEL 8.8 and 9.2 support is added.
|
||||
* SLES 15 SP5 support is added
|
||||
|
||||
:::
|
||||
|
||||
@@ -43,9 +43,9 @@ AMD ROCm™ Platform supports the following Linux distributions.
|
||||
|
||||
::::
|
||||
|
||||
- ✅: **Supported** - AMD performs full testing of all ROCm components on distro
|
||||
✅: **Supported** - AMD performs full testing of all ROCm components on distro
|
||||
GA image.
|
||||
- ❌: **Unsupported** - AMD no longer performs builds and testing on these
|
||||
❌: **Unsupported** - AMD no longer performs builds and testing on these
|
||||
previously supported distro GA images.
|
||||
|
||||
## Virtualization Support
|
||||
@@ -110,10 +110,10 @@ Use Driver Shipped with ROCm
|
||||
|
||||
### Support Status
|
||||
|
||||
- ✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
the corresponding ROCm product.
|
||||
- ⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
- ❌: **Unsupported** - This configuration is not enabled in our software
|
||||
⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
❌: **Unsupported** - This configuration is not enabled in our software
|
||||
distributions.
|
||||
|
||||
## CPU Support
|
||||
@@ -1,6 +1,6 @@
|
||||
# GPU and OS Support (Windows)
|
||||
|
||||
(supported_skus)=
|
||||
(windows-support)=
|
||||
|
||||
## Supported SKUs
|
||||
|
||||
@@ -62,16 +62,16 @@ on this table, the GPU is not officially supported by AMD.
|
||||
ROCm components are described in the [Reference material](../../reference/index). Support
|
||||
on Windows is provided with two levels on enablement.
|
||||
|
||||
- **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only.
|
||||
- **HIP SDK**: Runtime plus additional components refer to [Libraries](../../reference/libraries/index).
|
||||
Some [math libraries](../../reference/libraries/gpu_libraries/math) are Linux exclusive, please check the library details.
|
||||
* **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only.
|
||||
* **HIP SDK**: Runtime plus additional components refer to [Libraries](../../reference/libraries/index).
|
||||
Some [math libraries](../../reference/libraries/gpu-libraries/math) are Linux exclusive, please check the library details.
|
||||
|
||||
### Support Status
|
||||
|
||||
- ✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
the corresponding ROCm product.
|
||||
- ⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
- ❌: **Unsupported** - This configuration is not enabled in our software
|
||||
⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
❌: **Unsupported** - This configuration is not enabled in our software
|
||||
distributions.
|
||||
|
||||
## CPU Support
|
||||
33
docs/about/release-notes.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Release Notes
|
||||
<!-- Do not edit this file! This file is autogenerated with -->
|
||||
<!-- tools/autotag/tag_script.py -->
|
||||
|
||||
<!-- Disable lints since this is an auto-generated file. -->
|
||||
<!-- markdownlint-disable blanks-around-headers -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable no-blanks-blockquote -->
|
||||
<!-- markdownlint-disable ul-indent -->
|
||||
<!-- markdownlint-disable no-trailing-spaces -->
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
|
||||
The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
### What's New in This Release
|
||||
|
||||
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
## HIP 5.6.1 (for ROCm 5.6.1)
|
||||
|
||||
### Fixed Defects
|
||||
|
||||
* *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
* Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
* Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
* Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
@@ -1,583 +0,0 @@
|
||||
# Release Notes
|
||||
<!-- Do not edit this file! This file is autogenerated with -->
|
||||
<!-- tools/autotag/tag_script.py -->
|
||||
|
||||
<!-- Disable lints since this is an auto-generated file. -->
|
||||
<!-- markdownlint-disable blanks-around-headers -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable no-blanks-blockquote -->
|
||||
<!-- markdownlint-disable ul-indent -->
|
||||
<!-- markdownlint-disable no-trailing-spaces -->
|
||||
<!-- markdownlint-disable commands-show-output -->
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
|
||||
The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
#### Release Highlights
|
||||
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
|
||||
- New documentation portal at https://rocm.docs.amd.com
|
||||
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
|
||||
#### OS and GPU Support Changes
|
||||
|
||||
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
|
||||
- Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
- Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
|
||||
#### AMDSMI CLI 23.0.0.4
|
||||
|
||||
##### Added
|
||||
|
||||
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
- Package: amd-smi-lib
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- not all Error Correction Code (ECC) fields are currently supported
|
||||
|
||||
- RHEL 8 & SLES 15 have extra install steps
|
||||
|
||||
#### Kernel Modules (DKMS)
|
||||
|
||||
##### Fixes
|
||||
|
||||
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
#### HIP 5.6 (For ROCm 5.6)
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
- Optimized lock for graph global capture mode
|
||||
|
||||
##### Added
|
||||
|
||||
- Added hipRTC support for amd_hip_fp16
|
||||
- Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
|
||||
- hipArrayGetInfo for getting information about the specified array
|
||||
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
- hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
##### Changed
|
||||
|
||||
- hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
- Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Limited testing on xnack+ configuration
|
||||
- Multiple HIP tests failures (gpuvm fault or hangs)
|
||||
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
|
||||
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
|
||||
|
||||
##### Upcoming changes in future release
|
||||
|
||||
- Removal of gcnarch from hipDeviceProp_t structure
|
||||
- Addition of new fields in hipDeviceProp_t structure
|
||||
- maxTexture1D
|
||||
- maxTexture2D
|
||||
- maxTexture1DLayered
|
||||
- maxTexture2DLayered
|
||||
- sharedMemPerMultiprocessor
|
||||
- deviceOverlap
|
||||
- asyncEngineCount
|
||||
- surfaceAlignment
|
||||
- unifiedAddressing
|
||||
- computePreemptionSupported
|
||||
- uuid
|
||||
- Removal of deprecated code
|
||||
- hip-hcc codes from hip code tree
|
||||
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
|
||||
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
|
||||
#### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
|
||||
- On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
#### ROCprofiler (For ROCm 5.6.0)
|
||||
|
||||
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
|
||||
ROCm 5.5 are split into separate files. The `rocmtools` files that were
|
||||
deprecated in ROCm 5.5 have been removed.
|
||||
|
||||
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|
||||
|-----------------|-------------------------------------|----------------------------------------|
|
||||
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
|
||||
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
|
||||
int main() {
|
||||
// Use the rocprofilerV1 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
|
||||
|
||||
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
|
||||
int main() {
|
||||
// Use the rocprofilerV2 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
##### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
|
||||
### Library Changes in ROCM 5.6.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
|
||||
| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
|
||||
| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
|
||||
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
|
||||
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
|
||||
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
|
||||
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
|
||||
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
|
||||
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
|
||||
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
|
||||
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
|
||||
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
|
||||
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
|
||||
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
|
||||
|
||||
#### hipBLAS 1.0.0
|
||||
|
||||
hipBLAS 1.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
|
||||
|
||||
##### Removed
|
||||
|
||||
- removed support for deprecated hipblasInt8Datatype_t enum
|
||||
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
|
||||
out-of-place functionality
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.12
|
||||
|
||||
hipFFT 1.0.12 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
#### hipSOLVER 1.8.0
|
||||
|
||||
hipSOLVER 1.8.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added compatibility API with hipsolverRf prefix
|
||||
|
||||
#### hipSPARSE 2.3.6
|
||||
|
||||
hipSPARSE 2.3.6 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added SpGEMM algorithms
|
||||
|
||||
##### Changed
|
||||
|
||||
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changed
|
||||
|
||||
- Tuning results for MLIR on ROCm 5.5
|
||||
- Bumping MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fix 3d convolution Host API bug
|
||||
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.9
|
||||
|
||||
rocALUTION 2.1.9 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed synchronization issues in level 1 routines
|
||||
|
||||
#### rocBLAS 3.0.0
|
||||
|
||||
rocBLAS 3.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
|
||||
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
|
||||
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
|
||||
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
|
||||
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
|
||||
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
|
||||
|
||||
##### Removed
|
||||
|
||||
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
|
||||
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
|
||||
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
|
||||
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
|
||||
|
||||
##### Dependencies
|
||||
|
||||
- build only dependency on python joblib added as used by Tensile build
|
||||
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
|
||||
|
||||
##### Fixed
|
||||
|
||||
- make trsm offset calculations 64 bit safe
|
||||
|
||||
##### Changed
|
||||
|
||||
- refactor rotg test code
|
||||
|
||||
#### rocFFT 1.0.23
|
||||
|
||||
rocFFT 1.0.23 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
|
||||
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
|
||||
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Replaced std::complex with hipComplex data types for data generator.
|
||||
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
|
||||
|
||||
#### rocm-cmake 0.9.0
|
||||
|
||||
rocm-cmake 0.9.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added the option ROCM_HEADER_WRAPPER_WERROR
|
||||
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
|
||||
- Configure-time CMake option sets the default for the C macro.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
- Improved the performance of `device_merge_sort`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits<unsigned int>::max()`.
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
#### rocSOLVER 3.22.0
|
||||
|
||||
rocSOLVER 3.22.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- LU refactorization for sparse matrices
|
||||
- CSRRF_ANALYSIS
|
||||
- CSRRF_SUMLU
|
||||
- CSRRF_SPLITLU
|
||||
- CSRRF_REFACTLU
|
||||
- Linear system solver for sparse matrices
|
||||
- CSRRF_SOLVE
|
||||
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of BDSQR and GESVD when singular vectors are requested
|
||||
|
||||
##### Fixed
|
||||
|
||||
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
|
||||
|
||||
#### rocSPARSE 2.5.2
|
||||
|
||||
rocSPARSE 2.5.2 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a memory leak in csritsv
|
||||
- Fixed a bug in csrsm and bsrsm
|
||||
|
||||
#### rocThrust 2.18.0
|
||||
|
||||
rocThrust 2.18.0 for ROCm 5.6.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
#### rocWMMA 1.1.0
|
||||
|
||||
rocWMMA 1.1.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
|
||||
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
|
||||
- Added performance gemm samples for half, single and double precision
|
||||
- Added rocWMMA cmake versioning
|
||||
- Added vectorized support in coordinate transforms
|
||||
- Included ROCm smi for runtime clock rate detection
|
||||
- Added fragment transforms for transpose and change data layout
|
||||
|
||||
##### Changed
|
||||
|
||||
- Default to GPU rocBLAS validation against rocWMMA
|
||||
- Re-enabled int8 gemm tests on gfx9
|
||||
- Upgraded to C++17
|
||||
- Restructured unit test folder for consistency
|
||||
- Consolidated rocWMMA samples common code
|
||||
|
||||
#### Tensile 4.37.0
|
||||
|
||||
Tensile 4.37.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added user driven tuning API
|
||||
- Added decision tree fallback feature
|
||||
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
|
||||
- DirectToVgpr support for fp16 and Int8 with TN orientation
|
||||
- Added new test cases for various functions
|
||||
- Added SingleBuffer algorithm for ZGEMM/CGEMM
|
||||
- Added joblib for parallel map calls
|
||||
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
|
||||
- Added asmcap check for MIArchVgpr
|
||||
- Added support for MFMA + LocalSplitU
|
||||
- Added frequency, power, and temperature data to the output
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
|
||||
- Reduced the running time of the extended and pre_checkin tests
|
||||
- Optimized the Tailloop section of the assembly kernel
|
||||
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
|
||||
- Improved the performance of the second kernel of MultipleBuffer algorithm
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated custom kernels with 64-bit offsets
|
||||
- Adapted 64-bit offset arguments for assembly kernels
|
||||
- Improved temporary register re-use to reduce max sgpr usage
|
||||
- Removed some restrictions on VectorWidth and DirectToVgpr
|
||||
- Updated the dependency requirements for Tensile
|
||||
- Changed the range of AssertSummationElementMultiple
|
||||
- Modified the error messages for more clarity
|
||||
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
|
||||
- Removed dummy vgpr for vectorStaticRemainder
|
||||
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
|
||||
- Removed qReg parameter from vectorStaticRemainder
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
|
||||
- 64-bit offset parameters for post kernels
|
||||
- Fixed gfx908 CI test failures
|
||||
- Fixed offset calculation to prevent overflow for large offsets
|
||||
- Fixed issues when BufferLoad and BufferStore are equal to zero
|
||||
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
|
||||
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
|
||||
- Fixed the memory access error related to StaggerU + large stride
|
||||
- Fixed ZGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed DGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed ASEM + GSU + NoTailLoop opt mismatch
|
||||
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
|
||||
- Fixed ASEM + GSU + TailLoop inner unroll
|
||||
@@ -4,33 +4,27 @@ The following sections cover inferencing and introduces MIGraphX.
|
||||
|
||||
## Inference
|
||||
|
||||
The inference is where capabilities learned during Deep Learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep Learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
|
||||
The inference is where capabilities learned during deep-learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep-learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
|
||||
|
||||
Taking our previous example of MNIST, the DNN can be fed new images of handwritten digit images, allowing the neural network to classify digits. A fully trained DNN should make accurate predictions about what an image represents, and inference cannot happen without training.
|
||||
|
||||
## MIGraphX Introduction
|
||||
|
||||
MIGraphX is a graph compiler focused on accelerating the Machine Learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the Machine Learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
|
||||
MIGraphX is a graph compiler focused on accelerating the machine-learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the machine-learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
|
||||
|
||||
- Operator fusion
|
||||
|
||||
- Arithmetic simplifications
|
||||
|
||||
- Dead-code elimination
|
||||
|
||||
- Common subexpression elimination (CSE)
|
||||
|
||||
- Constant propagation
|
||||
* Operator fusion
|
||||
* Arithmetic simplifications
|
||||
* Dead-code elimination
|
||||
* Common subexpression elimination (CSE)
|
||||
* Constant propagation
|
||||
|
||||
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
|
||||
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
|
||||
- Number of arguments
|
||||
|
||||
- Type of arguments
|
||||
|
||||
- Shape of arguments
|
||||
* Number of arguments
|
||||
* Type of arguments
|
||||
* Shape of arguments
|
||||
|
||||
After optimization passes, all these operators get mapped to different kernels on GPUs or CPUs.
|
||||
|
||||
@@ -54,11 +48,11 @@ The header files and libraries are installed under `/opt/rocm-\<version\>`, wher
|
||||
|
||||
There are two ways to build the MIGraphX sources.
|
||||
|
||||
- [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install the prerequisites and build the libraries with just one command.
|
||||
* [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install the prerequisites and build the libraries with just one command.
|
||||
|
||||
or
|
||||
|
||||
- [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
|
||||
* [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
|
||||
|
||||
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
|
||||
|
||||
@@ -329,7 +323,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
|
||||
|
||||
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file.
|
||||
|
||||
```{figure} ../data/rocm_ai/image018.png
|
||||
```{figure} ../data/rocm-ai/image018.png
|
||||
:name: image018
|
||||
:align: center
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Inception V3 with PyTorch
|
||||
|
||||
## Deep Learning Training
|
||||
## Deep learning training
|
||||
|
||||
Deep Learning models are designed to capture the complexity of the problem and the underlying data. These models are "deep," comprising multiple component layers. Training is finding the best parameters for each model layer to achieve a well-defined objective.
|
||||
Deep-learning models are designed to capture the complexity of the problem and the underlying data. These models are "deep," comprising multiple component layers. Training is finding the best parameters for each model layer to achieve a well-defined objective.
|
||||
|
||||
The training data consists of input features in supervised learning, similar to what the learned model is expected to see during the evaluation or inference phase. The target output is also included, which serves to teach the model. A loss metric is defined as part of training that evaluates the model's performance during the training process.
|
||||
|
||||
@@ -56,7 +56,7 @@ This example is adapted from the PyTorch research hub page on Inception v3[^torc
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch_install) for setting up a PyTorch environment on ROCm.
|
||||
1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch-install) for setting up a PyTorch environment on ROCm.
|
||||
|
||||
```dockerfile
|
||||
docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
|
||||
@@ -146,7 +146,7 @@ The previous section focused on downloading and using the Inception v3 model for
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch_install) for setting up a PyTorch environment on ROCm.
|
||||
1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](../tutorials/install/pytorch-install) for setting up a PyTorch environment on ROCm.
|
||||
|
||||
```dockerfile
|
||||
docker pull rocm/pytorch:latest
|
||||
@@ -463,7 +463,7 @@ torch.save(model.state_dict(), "trained_inception_v3.pt")
|
||||
|
||||
Plotting the train and test loss shows both metrics reducing over training epochs. This is demonstrated in the following image.
|
||||
|
||||
```{figure} ../data/rocm_ai/inception_v3.png
|
||||
```{figure} ../data/rocm-ai/inception-v3.png
|
||||
:name: inception-v3
|
||||
---
|
||||
align: center
|
||||
@@ -741,7 +741,7 @@ To understand the code step by step, follow these steps:
|
||||
plt.show()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/mnist_1.png
|
||||
```{figure} ../data/rocm-ai/mnist-1.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -769,13 +769,13 @@ To understand the code step by step, follow these steps:
|
||||
plt.show()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/mnist_2.png
|
||||
```{figure} ../data/rocm-ai/mnist-2.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
```
|
||||
|
||||
The basic building block of a neural network is the layer. Layers extract representations from the data fed into them. Deep Learning consists of chaining together simple layers. Most layers, such as `tf.keras.layers.Dense`, have parameters that are learned during training.
|
||||
The basic building block of a neural network is the layer. Layers extract representations from the data fed into them. Deep learning consists of chaining together simple layers. Most layers, such as `tf.keras.layers.Dense`, have parameters that are learned during training.
|
||||
|
||||
```py
|
||||
model = tf.keras.Sequential([
|
||||
@@ -785,9 +785,9 @@ To understand the code step by step, follow these steps:
|
||||
])
|
||||
```
|
||||
|
||||
- The first layer in this network `tf.keras.layers.Flatten` transforms the format of the images from a two-dimensional array (of 28 x 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.
|
||||
* The first layer in this network `tf.keras.layers.Flatten` transforms the format of the images from a two-dimensional array (of 28 x 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.
|
||||
|
||||
- After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are densely connected or fully connected neural layers. The first Dense layer has 128 nodes (or neurons). The second (and last) layer returns a logits array with a length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes.
|
||||
* After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are densely connected or fully connected neural layers. The first Dense layer has 128 nodes (or neurons). The second (and last) layer returns a logits array with a length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes.
|
||||
|
||||
12. You must add the Loss function, Metrics, and Optimizer at the time of model compilation.
|
||||
|
||||
@@ -797,11 +797,11 @@ To understand the code step by step, follow these steps:
|
||||
metrics=['accuracy'])
|
||||
```
|
||||
|
||||
- Loss function —This measures how accurate the model is during training when you are looking to minimize this function to "steer" the model in the right direction.
|
||||
* Loss function —This measures how accurate the model is during training when you are looking to minimize this function to "steer" the model in the right direction.
|
||||
|
||||
- Optimizer —This is how the model is updated based on the data it sees and its loss function.
|
||||
* Optimizer —This is how the model is updated based on the data it sees and its loss function.
|
||||
|
||||
- Metrics —This is used to monitor the training and testing steps.
|
||||
* Metrics —This is used to monitor the training and testing steps.
|
||||
|
||||
The following example uses accuracy, the fraction of the correctly classified images.
|
||||
|
||||
@@ -895,7 +895,7 @@ To understand the code step by step, follow these steps:
|
||||
plt.show()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/mnist_3.png
|
||||
```{figure} ../data/rocm-ai/mnist-3.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -911,7 +911,7 @@ To understand the code step by step, follow these steps:
|
||||
plt.show()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/mnist_4.png
|
||||
```{figure} ../data/rocm-ai/mnist-4.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -946,7 +946,7 @@ To understand the code step by step, follow these steps:
|
||||
plt.show()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/mnist_5.png
|
||||
```{figure} ../data/rocm-ai/mnist-5.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -988,7 +988,7 @@ Follow these steps:
|
||||
cache_subdir='')
|
||||
```
|
||||
|
||||
```py
|
||||
```bash
|
||||
Downloading data from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
|
||||
84131840/84125825 [==============================] – 1s 0us/step
|
||||
84149932/84125825 [==============================] – 1s 0us/step
|
||||
@@ -1115,7 +1115,7 @@ To prepare the data for training, follow these steps:
|
||||
print("Vectorized review", vectorize_text(first_review, first_label))
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/TextClassification_3.png
|
||||
```{figure} ../data/rocm-ai/TextClassification-3.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -1158,7 +1158,7 @@ To prepare the data for training, follow these steps:
|
||||
model.summary()
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/TextClassification_4.png
|
||||
```{figure} ../data/rocm-ai/TextClassification-4.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -1178,7 +1178,7 @@ To prepare the data for training, follow these steps:
|
||||
history = model.fit(train_ds,validation_data=val_ds,epochs=epochs)
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/TextClassification_5.png
|
||||
```{figure} ../data/rocm-ai/TextClassification-5.png
|
||||
---
|
||||
align: center
|
||||
---
|
||||
@@ -1226,7 +1226,7 @@ To prepare the data for training, follow these steps:
|
||||
|
||||
The following images illustrate the training and validation loss and the training and validation accuracy.
|
||||
|
||||
```{figure} ../data/rocm_ai/TextClassification_6.png
|
||||
```{figure} ../data/rocm-ai/TextClassification-6.png
|
||||
:name: TextClassification6
|
||||
---
|
||||
align: center
|
||||
@@ -1234,7 +1234,7 @@ To prepare the data for training, follow these steps:
|
||||
Training and Validation Loss
|
||||
```
|
||||
|
||||
```{figure} ../data/rocm_ai/TextClassification_7.png
|
||||
```{figure} ../data/rocm-ai/TextClassification-7.png
|
||||
:name: TextClassification7
|
||||
---
|
||||
align: center
|
||||
@@ -20,10 +20,10 @@ Finding Dependencies
|
||||
|
||||
In short, CMake supports finding dependencies in two ways:
|
||||
|
||||
- In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to
|
||||
* In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to
|
||||
find the component in typical install locations and layouts. CMake ships a
|
||||
few dozen such scripts, but users and projects may ship them as well.
|
||||
- In Config mode, it locates a file named ``<packagename>-config.cmake`` or
|
||||
* In Config mode, it locates a file named ``<packagename>-config.cmake`` or
|
||||
``<PackageName>Config.cmake`` which describes the installed component in all
|
||||
regards needed to consume it.
|
||||
|
||||
@@ -88,8 +88,8 @@ from the new location (`/opt/rocm-<ver>/include`) as shown in the example below.
|
||||
#include <hip/hip_runtime.h>
|
||||
```
|
||||
|
||||
- Starting at ROCm 5.2 release, the deprecation for backward compatibility wrapper header files is: `#pragma` message announcing `#warning`.
|
||||
- Starting from ROCm 6.0 (tentatively) backward compatibility for wrapper header files will be removed, and the `#pragma` message will be announcing `#error`.
|
||||
* Starting at ROCm 5.2 release, the deprecation for backward compatibility wrapper header files is: `#pragma` message announcing `#warning`.
|
||||
* Starting from ROCm 6.0 (tentatively) backward compatibility for wrapper header files will be removed, and the `#pragma` message will be announcing `#error`.
|
||||
|
||||
### Executable Files
|
||||
|
||||
@@ -5,23 +5,25 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} AMD Instinct MI200
|
||||
:::{grid-item-card}
|
||||
**[AMD Instinct MI250](./gpu-arch/mi250.md)**
|
||||
|
||||
Review hardware aspects of the AMD Instinct™ MI250
|
||||
accelerators and the CDNA™ 2 architecture that is the foundation of these GPUs.
|
||||
|
||||
- [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
|
||||
- [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
- [Guide](./gpu_arch/mi250.md)
|
||||
* [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
|
||||
* [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} AMD Instinct MI100
|
||||
:::{grid-item-card}
|
||||
**[AMD Instinct MI100](./gpu-arch/mi100.md)**
|
||||
|
||||
Review hardware aspects of the AMD Instinct™ MI100
|
||||
accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs.
|
||||
|
||||
- [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
|
||||
- [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
- [Guide](./gpu_arch/mi100.md)
|
||||
* [Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
|
||||
* [Whitepaper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
|
||||
:::
|
||||
|
||||
@@ -29,18 +31,18 @@ accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs
|
||||
|
||||
## ISA Documentation
|
||||
|
||||
- [AMD Instinct MI200/CDNA2 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
|
||||
- [AMD Instinct MI100/CDNA1 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
|
||||
- [AMD Instinct MI50/Vega 7nm Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
|
||||
- [AMD Instinct MI25/Vega Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
|
||||
- [AMD RDNA3 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf)
|
||||
- [AMD RDNA2 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
|
||||
- [AMD RDNA Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna-shader-instruction-set-architecture.pdf)
|
||||
- [AMD GCN3 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
|
||||
* [AMD Instinct MI200/CDNA2 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
|
||||
* [AMD Instinct MI100/CDNA1 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
|
||||
* [AMD Instinct MI50/Vega 7nm Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
|
||||
* [AMD Instinct MI25/Vega Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
|
||||
* [AMD RDNA3 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf)
|
||||
* [AMD RDNA2 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
|
||||
* [AMD RDNA Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/rdna-shader-instruction-set-architecture.pdf)
|
||||
* [AMD GCN3 Instruction Set Architecture](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
|
||||
|
||||
## White Papers
|
||||
|
||||
- [AMD CDNA™ 2 Architecture White Paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
- [AMD CDNA Architecture White Paper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
- [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
|
||||
- [AMD RDNA Architecture White Paper](https://www.amd.com/system/files/documents/rdna-whitepaper.pdf)
|
||||
* [AMD CDNA™ 2 Architecture White Paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
* [AMD CDNA Architecture White Paper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
* [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
|
||||
* [AMD RDNA Architecture White Paper](https://www.amd.com/system/files/documents/rdna-whitepaper.pdf)
|
||||
@@ -17,7 +17,7 @@ available to connect the processors plus one PCIe Gen 4 x16 link per processor
|
||||
can attach additional I/O devices such as the host adapters for the network
|
||||
fabric.
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image004.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image004.png
|
||||
:name: mi100-arch
|
||||
:alt: Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.
|
||||
:align: center
|
||||
@@ -43,7 +43,7 @@ computing (HPC) and AI & machine learning (ML) that run on everything from
|
||||
individual servers to the world's largest exascale supercomputers. The overall
|
||||
system architecture is designed for extreme scalability and compute performance.
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image005.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image005.png
|
||||
:name: mi100-block
|
||||
:alt: Structure of the AMD Instinct accelerator (MI100 generation).
|
||||
:align: center
|
||||
@@ -72,7 +72,7 @@ instructions of 16 data elements per instruction. This enables the CU to process
|
||||
Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS
|
||||
(`4 [SIMD units] x 16 [elements per instruction] x 120 [CU] x 1.5 [GHz]`).
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image006.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image006.png
|
||||
:name: mi100-gcd
|
||||
:alt: Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture.
|
||||
:align: center
|
||||
@@ -7,7 +7,7 @@ accelerators and the CDNA™ 2 architecture that is the foundation of these GPUs
|
||||
|
||||
The micro-architecture of the AMD Instinct MI250 accelerators is based on the
|
||||
AMD CDNA 2 architecture that targets compute applications such as HPC,
|
||||
artificial intelligence (AI), and Machine Learning (ML) and that run on
|
||||
artificial intelligence (AI), and machine learning (ML) and that run on
|
||||
everything from individual servers to the world’s largest exascale
|
||||
supercomputers. The overall system architecture is designed for extreme
|
||||
scalability and compute performance.
|
||||
@@ -38,7 +38,7 @@ execution units (also called matrix cores), which are geared toward executing
|
||||
matrix operations like matrix-matrix multiplications. For FP64, the peak
|
||||
performance of these units amounts to 90.5 TFLOPS.
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image001.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image001.png
|
||||
:name: mi250-gcd
|
||||
:alt: Structure of a single GCD in the AMD Instinct MI250 accelerator.
|
||||
:align: center
|
||||
@@ -97,7 +97,7 @@ is being retired in each clock cycle. The third column lists the theoretical
|
||||
peak performance of the OAM module. The theoretical aggregated peak memory
|
||||
bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image002.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image002.png
|
||||
:name: mi250-perf
|
||||
:alt: Dual-GCD architecture of the AMD Instinct MI250 accelerators..
|
||||
:align: center
|
||||
@@ -122,7 +122,7 @@ GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch
|
||||
. Note that some platforms may offer an x8 interface to the GCDs, which reduces
|
||||
the available host-to-GPU bandwidth.
|
||||
|
||||
```{figure} ../../data/conceptual/gpu_arch/image003.png
|
||||
```{figure} ../../data/conceptual/gpu-arch/image003.png
|
||||
:name: mi250-block
|
||||
:alt: Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor.
|
||||
:align: center
|
||||
@@ -3,42 +3,42 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Compiler Nomencalture
|
||||
:link: ./compiler_disambiguation
|
||||
:link-type: doc
|
||||
:::{grid-item-card}
|
||||
**[Compiler Nomenclature](./compiler-disambiguation.md)**
|
||||
|
||||
ROCm ships multiple compilers of varying origins and purposes. This article
|
||||
disambiguates compiler naming used throughout the documentation.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Using CMake
|
||||
:link: ./cmake_packages
|
||||
:link-type: doc
|
||||
:::{grid-item-card}
|
||||
**[Using CMake](./cmake-packages.rst)**
|
||||
|
||||
ROCm components ship with 1st party CMake support. This article details how that
|
||||
support works and how to use it.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Linux Folder Structure Reorganization
|
||||
:link: ./file_reorg
|
||||
:link-type: doc
|
||||
:::{grid-item-card}
|
||||
**[Linux Folder Structure Reorganization](./file-reorg.md)**
|
||||
|
||||
ROCm™ packages have adopted the Linux foundation file system hierarchy standard
|
||||
to ensure ROCm components follow open source conventions for Linux-based
|
||||
distributions.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU Isolation Techniques
|
||||
:link: ./gpu_isolation
|
||||
:link-type: doc
|
||||
:::{grid-item-card}
|
||||
**[GPU Isolation Techniques](./gpu-isolation.md)**
|
||||
|
||||
Restricting the access of applications to a subset of GPUs, aka isolating GPUs
|
||||
allows users to hide GPU resources from programs.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU Architectures
|
||||
:link: ./gpu_arch
|
||||
:link-type: doc
|
||||
:::{grid-item-card}
|
||||
**[GPU Architectures](./gpu-arch.md)**
|
||||
|
||||
AMD documentation around architectural details from both the CDNA and RDNA
|
||||
product lines.
|
||||
|
||||
|
||||
@@ -13,12 +13,12 @@ The address sanitizer process begins by compiling the application of interest wi
|
||||
|
||||
Recommendations for doing this are:
|
||||
|
||||
+ Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
|
||||
+ Add the following options to the existing compiler and linker options:
|
||||
+ `-fsanitize=address` - enables instrumentation
|
||||
+ `-shared-libsan` - use shared version of runtime
|
||||
+ `-g` - add debug info for improved reporting
|
||||
+ Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
|
||||
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
|
||||
* Add the following options to the existing compiler and linker options:
|
||||
* `-fsanitize=address` - enables instrumentation
|
||||
* `-shared-libsan` - use shared version of runtime
|
||||
* `-g` - add debug info for improved reporting
|
||||
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
|
||||
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
|
||||
|
||||
It is not an error to compile some files without address sanitizer instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the Address Sanitizer runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
|
||||
@@ -29,9 +29,9 @@ When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code a
|
||||
|
||||
There are a few options if the compile time becomes unacceptable:
|
||||
|
||||
+ Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the address sanitizer process.
|
||||
+ Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
|
||||
+ Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
|
||||
* Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the address sanitizer process.
|
||||
* Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
|
||||
* Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
|
||||
|
||||
## Use AMD Supplied Address Sanitizer Instrumented Libraries
|
||||
|
||||
@@ -47,33 +47,33 @@ When adjusting an application build to add instrumentation, linking against thes
|
||||
|
||||
Here are a few recommendations to consider before running an address sanitizer instrumented heterogeneous application.
|
||||
|
||||
+ Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
|
||||
+ Ensure XNACK is enabled
|
||||
+ For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
|
||||
+ For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
|
||||
* Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
|
||||
* Ensure XNACK is enabled
|
||||
* For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
|
||||
* For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
|
||||
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
|
||||
|
||||
+ Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
|
||||
* Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
|
||||
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
|
||||
|
||||
+ Ensure that the application depends on the address sanitizer runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
|
||||
* Ensure that the application depends on the address sanitizer runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
|
||||
If it does not appear, when executed the application will quickly output an address sanitizer error that looks like:
|
||||
|
||||
```bash
|
||||
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
|
||||
```
|
||||
|
||||
+ Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
|
||||
* Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
|
||||
|
||||
There is an environment variable, `ASAN_OPTIONS` which can be used to adjust the runtime behavior of the ASAN runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASAN runtime. The device runtime only currently supports the default settings for the few relevant options.
|
||||
|
||||
There are two `ASAN_OPTION` flags of particular note.
|
||||
|
||||
+ `halt_on_error=0/1 default 1`.
|
||||
* `halt_on_error=0/1 default 1`.
|
||||
|
||||
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional address sanitizer instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
|
||||
|
||||
+ `detect_leaks=0/1 default 1`.
|
||||
* `detect_leaks=0/1 default 1`.
|
||||
This option directs the address sanitizer runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
|
||||
|
||||
## Runtime Overhead
|
||||
@@ -216,10 +216,10 @@ $ rocgdb <path to application>
|
||||
|
||||
## Known Issues with Using GPU Sanitizer
|
||||
|
||||
+ Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
|
||||
* Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
|
||||
|
||||
+ Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
|
||||
* Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
|
||||
|
||||
+ Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
|
||||
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
|
||||
|
||||
+ It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
|
||||
* It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
|
||||
39
docs/conf.py
@@ -9,8 +9,8 @@ import shutil
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
|
||||
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
|
||||
shutil.copy2('../RELEASE.md','./release.md')
|
||||
shutil.copy2('../CONTRIBUTING.md','./contribute/index.md')
|
||||
shutil.copy2('../RELEASE.md','./about/release-notes.md')
|
||||
# Keep capitalization due to similar linking on GitHub's markdown preview.
|
||||
shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
|
||||
|
||||
@@ -35,37 +35,40 @@ article_pages = [
|
||||
"date":"2023-07-27"
|
||||
},
|
||||
|
||||
{"file":"tutorials/quick_start/windows", "os":["windows"]},
|
||||
{"file":"tutorials/quick_start/linux", "os":["linux"]},
|
||||
{"file":"tutorials/quick-start/windows", "os":["windows"]},
|
||||
{"file":"tutorials/quick-start/linux", "os":["linux"]},
|
||||
|
||||
{"file":"tutorials/install/linux/index", "os":["linux"]},
|
||||
{"file":"tutorials/install/linux/install_overview", "os":["linux"]},
|
||||
{"file":"tutorials/install/linux/install-options", "os":["linux"]},
|
||||
{"file":"tutorials/install/linux/prerequisites", "os":["linux"]},
|
||||
|
||||
{"file":"tutorials/install/docker", "os":["linux"]},
|
||||
{"file":"tutorials/install/magma_install", "os":["linux"]},
|
||||
{"file":"tutorials/install/pytorch_install", "os":["linux"]},
|
||||
{"file":"tutorials/install/tensorflow_install", "os":["linux"]},
|
||||
{"file":"tutorials/install/magma-install", "os":["linux"]},
|
||||
{"file":"tutorials/install/pytorch-install", "os":["linux"]},
|
||||
{"file":"tutorials/install/tensorflow-install", "os":["linux"]},
|
||||
|
||||
{"file":"tutorials/install/windows/index", "os":["windows"]},
|
||||
{"file":"tutorials/install/windows/prerequisites", "os":["windows"]},
|
||||
{"file":"tutorials/install/windows/cli/index", "os":["windows"]},
|
||||
{"file":"tutorials/install/windows/gui/index", "os":["windows"]},
|
||||
|
||||
{"file":"about/release/linux_support", "os":["linux"]},
|
||||
{"file":"about/release/windows_support", "os":["windows"]},
|
||||
{"file":"about/compatibility/linux-support", "os":["linux"]},
|
||||
{"file":"about/compatibility/windows-support", "os":["windows"]},
|
||||
|
||||
{"file":"about/compatibility/docker_image_support_matrix", "os":["linux"]},
|
||||
{"file":"about/compatibility/docker-image-support-matrix", "os":["linux"]},
|
||||
|
||||
{"file":"reference/libraries/gpu_libraries/communication", "os":["linux"]},
|
||||
{"file":"reference/compilers_tools/index", "os":["linux"]},
|
||||
{"file":"reference/computer_vision", "os":["linux"]},
|
||||
{"file":"reference/libraries/gpu-libraries/index", "os":["linux"]},
|
||||
{"file":"reference/compilers-tools/index", "os":["linux"]},
|
||||
{"file":"reference/index", "os":["linux"]},
|
||||
|
||||
{"file":"how_to/deep_learning_rocm", "os":["linux"]},
|
||||
{"file":"how_to/gpu_aware_mpi", "os":["linux"]},
|
||||
{"file":"how_to/system_debugging", "os":["linux"]},
|
||||
{"file":"how-to/deep-learning-rocm", "os":["linux"]},
|
||||
{"file":"how-to/gpu-aware-mpi", "os":["linux"]},
|
||||
{"file":"how-to/system-debugging", "os":["linux"]},
|
||||
{"file":"how-to/index", "os":["linux", "windows"]},
|
||||
|
||||
{"file":"rocm-ai", "os":["linux", "windows"]},
|
||||
{"file":"rocm-a-z", "os":["linux", "windows"]},
|
||||
|
||||
{"file":"rocm_ai/rocm_ai", "os":["linux"]},
|
||||
]
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
@@ -10,12 +10,12 @@ When opening a PR to the `develop` branch on GitHub, the page corresponding to
|
||||
the PR (`https://github.com/RadeonOpenCompute/ROCm/pull/<pr_number>`) will have
|
||||
a summary at the bottom. This requires the user be logged in to GitHub.
|
||||
|
||||
- There, click `Show all checks` and `Details` of the Read the Docs pipeline. It
|
||||
* There, click `Show all checks` and `Details` of the Read the Docs pipeline. It
|
||||
will take you to a URL of the form
|
||||
`https://readthedocs.com/projects/advanced-micro-devices-rocm/builds/<some_build_num>/`
|
||||
- The list of commands shown are the exact ones used by CI to produce a render
|
||||
* The list of commands shown are the exact ones used by CI to produce a render
|
||||
of the documentation.
|
||||
- There, click on the small blue link `View docs` (which is not the same as the
|
||||
* There, click on the small blue link `View docs` (which is not the same as the
|
||||
bigger button with the same text). It will take you to the built HTML site with
|
||||
a URL of the form
|
||||
`https://advanced-micro-devices-demo--<pr_number>.com.readthedocs.build/projects/alpha/en/<pr_number>/`.
|
||||
@@ -24,7 +24,7 @@ a summary at the bottom. This requires the user be logged in to GitHub.
|
||||
|
||||
Python versions known to build documentation:
|
||||
|
||||
- 3.8
|
||||
* 3.8
|
||||
|
||||
To build the docs locally using Python Virtual Environment (`venv`), execute the
|
||||
following commands from the project root:
|
||||
@@ -54,9 +54,8 @@ resulting website show up on a locally-served web server.
|
||||
### Configuring VS Code
|
||||
|
||||
1. Install the following extensions:
|
||||
|
||||
- Python `(ms-python.python)`
|
||||
- Live Server `(ritwickdey.LiveServer)`
|
||||
* Python `(ms-python.python)`
|
||||
* Live Server `(ritwickdey.LiveServer)`
|
||||
|
||||
2. Add the following entries in `.vscode/settings.json`
|
||||
|
||||
@@ -69,11 +68,11 @@ resulting website show up on a locally-served web server.
|
||||
```
|
||||
|
||||
The settings above are used for the following reasons:
|
||||
- `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
|
||||
* `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
|
||||
alongside the `tasks.json` command.
|
||||
- `liveServer.settings.wait`: Tells live server to wait with the update to give time for Sphinx to
|
||||
* `liveServer.settings.wait`: Tells live server to wait with the update to give time for Sphinx to
|
||||
regenerate site contents and not refresh before all is done. (Empirical value)
|
||||
- `python.terminal.activateEnvInCurrentTerminal`: Automatic virtual environment activation is a nice touch,
|
||||
* `python.terminal.activateEnvInCurrentTerminal`: Automatic virtual environment activation is a nice touch,
|
||||
should you want to build the site from the integrated terminal.
|
||||
|
||||
3. Add the following tasks in `.vscode/tasks.json`
|
||||
@@ -145,21 +144,21 @@ resulting website show up on a locally-served web server.
|
||||
|
||||
4. Configure Python virtual environment (`venv`)
|
||||
|
||||
- From the Command Palette, run `Python: Create Environment`
|
||||
- Select `venv` environment and the `docs/sphinx/requirements.txt` file.
|
||||
* From the Command Palette, run `Python: Create Environment`
|
||||
* Select `venv` environment and the `docs/sphinx/requirements.txt` file.
|
||||
_(Simply pressing enter while hovering over the file from the drop down is
|
||||
insufficient, one has to select the radio button with the 'Space' key if
|
||||
using the keyboard.)_
|
||||
|
||||
5. Build the docs
|
||||
|
||||
- Launch the default build Task using either:
|
||||
- a hotkey _(default is `Ctrl+Shift+B`)_ or
|
||||
- by issuing the `Tasks: Run Build Task` from the Command Palette.
|
||||
* Launch the default build Task using either:
|
||||
* a hotkey _(default is `Ctrl+Shift+B`)_ or
|
||||
* by issuing the `Tasks: Run Build Task` from the Command Palette.
|
||||
|
||||
6. Open the live preview
|
||||
|
||||
- Navigate to the output of the site within VS Code, right-click on
|
||||
* Navigate to the output of the site within VS Code, right-click on
|
||||
`.vscode/build/html/index.html` and select `Open with Live Server`. The
|
||||
contents should update on every rebuild without having to refresh the
|
||||
browser.
|
||||
|
||||
@@ -35,16 +35,16 @@ guide on writing and formatting on GitHub as a starting point.
|
||||
ROCm documentation adds additional requirements to Markdown and RST based files
|
||||
as follows:
|
||||
|
||||
- Level one headers are only used for page titles. There must be only one level
|
||||
* Level one headers are only used for page titles. There must be only one level
|
||||
1 header per file for both Markdown and Restructured Text.
|
||||
- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
|
||||
* Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
|
||||
our automated GitHub action on a Pull Request (PR).
|
||||
See the {doc}`rocm-docs-core linting user guide <rocm-docs-core:user_guide/linting>` for more details.
|
||||
|
||||
## Filenames and folder structure
|
||||
|
||||
Please use snake case (all lower case letters and underscores instead of spaces)
|
||||
for file names. For example, `example_file_name.md`.
|
||||
Please use kebab-case (all lower case letters and dashes instead of spaces)
|
||||
for file names. For example, `example-file-name.md`.
|
||||
Our documentation follows Pitchfork for folder structure.
|
||||
All documentation is in `/docs` except for special files like
|
||||
the contributing guide in the `/` folder. All images used in the documentation are
|
||||
@@ -52,8 +52,8 @@ placed in the `/docs/data` folder.
|
||||
|
||||
## Language and Style
|
||||
|
||||
Adopt Microsoft C++ docs guidelines for
|
||||
[Voice and tone](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
|
||||
Adopt Microsoft CPP-Docs guidelines for
|
||||
[Voice and Tone](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
|
||||
|
||||
ROCm documentation templates to be made public shortly. ROCm templates dictate
|
||||
the recommended structure and flow of the documentation. Guidelines on how to
|
||||
@@ -69,5 +69,3 @@ Raise issues in `rocm-docs-core` for any formatting concerns and changes request
|
||||
For more topics, such as submitting feedback and ways to build documentation,
|
||||
see the [Contributing Section](https://rocm.docs.amd.com/en/latest/contributing.html)
|
||||
at [rocm.docs.amd.com](https://rocm.docs.amd.com)
|
||||
|
||||
To learn more about how our documentation is built, refer to the [ROCm toolchain](toolchain.md).
|
||||
|
||||
@@ -9,7 +9,7 @@ project that applies customization for our documentation. This
|
||||
project is the tool most ROCm repositories use as part of the documentation
|
||||
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
|
||||
|
||||
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation <rocm-docs-core:index>`.
|
||||
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
|
||||
|
||||
## Sphinx
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 3.3 KiB After Width: | Height: | Size: 3.3 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 103 KiB |
|
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 59 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
|
Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 33 KiB |
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 19 KiB After Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 57 KiB After Width: | Height: | Size: 57 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 102 KiB After Width: | Height: | Size: 102 KiB |
|
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 114 KiB |
|
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
|
Before Width: | Height: | Size: 130 KiB After Width: | Height: | Size: 130 KiB |
|
Before Width: | Height: | Size: 21 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 8.8 KiB After Width: | Height: | Size: 8.8 KiB |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
|
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
|
Before Width: | Height: | Size: 323 KiB After Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 66 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 87 KiB After Width: | Height: | Size: 87 KiB |
|
Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 20 KiB |
|
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
|
Before Width: | Height: | Size: 64 KiB After Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 22 KiB After Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 69 KiB |
|
Before Width: | Height: | Size: 9.8 KiB After Width: | Height: | Size: 9.8 KiB |
|
Before Width: | Height: | Size: 9.1 KiB After Width: | Height: | Size: 9.1 KiB |
|
Before Width: | Height: | Size: 4.8 KiB After Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
|
Before Width: | Height: | Size: 309 KiB After Width: | Height: | Size: 309 KiB |
|
Before Width: | Height: | Size: 9.4 KiB After Width: | Height: | Size: 9.4 KiB |
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
|
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 64 KiB After Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 28 KiB |
|
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 21 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 91 KiB After Width: | Height: | Size: 91 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 66 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 87 KiB After Width: | Height: | Size: 87 KiB |
|
Before Width: | Height: | Size: 144 KiB After Width: | Height: | Size: 144 KiB |
|
Before Width: | Height: | Size: 9.1 KiB After Width: | Height: | Size: 9.1 KiB |
|
Before Width: | Height: | Size: 4.8 KiB After Width: | Height: | Size: 4.8 KiB |
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
20
docs/how-to/deep-learning-rocm.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Deep learning guide
|
||||
|
||||
The following sections cover the different framework installations for ROCm and
|
||||
deep-learning applications. The following image provides
|
||||
the sequential flow for the use of each framework. Refer to the ROCm Compatible
|
||||
Frameworks Release Notes for each framework's most current release notes at
|
||||
[Third party support](../about/compatibility/3rd-party-support-matrix.md).
|
||||
|
||||
```{figure} ../data/tutorials/install/magma-install/magma005.png
|
||||
:name: rocm-compat-frameworks-chart
|
||||
:align: center
|
||||
|
||||
ROCm Compatible Frameworks Flowchart
|
||||
```
|
||||
|
||||
## Frameworks Installation
|
||||
|
||||
* [How to Install PyTorch?](../tutorials/install/pytorch-install)
|
||||
* [How to Install Tensorflow?](../tutorials/install/tensorflow-install)
|
||||
* [How to Install Magma?](../tutorials/install/magma-install)
|
||||
@@ -72,7 +72,7 @@ make -j $(nproc)
|
||||
make -j $(nproc) install
|
||||
```
|
||||
|
||||
The [communication libraries tables](#communication_libraries)
|
||||
The [communication libraries tables](#communication-libraries)
|
||||
documents the compatibility of UCX versions with ROCm versions.
|
||||
|
||||
## Install Open MPI
|
||||
@@ -148,7 +148,7 @@ larger than 67MB, an effective utilization of about 150GB/sec is achieved, which
|
||||
corresponds to 75% of the peak transfer bandwidth of 200GB/sec for that
|
||||
connection:
|
||||
|
||||
:::{figure} ../data/how_to/gpu_enabled_mpi_1.png
|
||||
:::{figure} ../data/how-to/gpu-enabled-mpi-1.png
|
||||
:name: mpi-bandwidth
|
||||
:alt: OSU execution showing transfer bandwidth increasing alongside payload inc.
|
||||
Inter-GPU bandwidth with various payload sizes.
|
||||
@@ -161,7 +161,7 @@ Unified Collective Communication Library (UCC) component in Open MPI.
|
||||
For this, the UCC library has to be configured and compiled with ROCm
|
||||
support.
|
||||
|
||||
Please note the compatibility [tables](#communication_libraries)
|
||||
Please note the compatibility [tables](#communication-libraries)
|
||||
for UCC versions with the various ROCm versions.
|
||||
|
||||
An example for configuring UCC and Open MPI with ROCm support
|
||||
34
docs/how-to/index.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# All How-To Material
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Tuning Guides](./tuning-guides/index.md)**
|
||||
|
||||
Use case-specific system setup and tuning guides.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Deep-learning guide](./deep-learning-rocm.md)**
|
||||
|
||||
Installation of various deep learning frameworks and applications.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[GPU-Enabled MPI](./gpu-aware-mpi.md)**
|
||||
|
||||
This chapter exemplifies how to set up Open MPI with the ROCm platform.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[System Debugging Guide](./system-debugging.md)**
|
||||
|
||||
Useful commands to debug misbehaving ROCm installations.
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||