Compare commits

...

137 Commits

Author SHA1 Message Date
Sam Wu
7aada9a2ae Update documentation requirements 2024-09-16 10:13:02 -08:00
Sam Wu
1d0b86431c Update documentation requirements 2024-06-06 16:58:51 -06:00
Sam Wu
2c7bccfad0 Disable epub builds 2024-05-02 09:14:55 -06:00
Sam Wu
3ec5d174d8 Fix RTD config 2024-05-02 08:55:58 -06:00
Sam Wu
5fe10b4fa5 Update documentation requirements 2024-05-01 16:59:40 -06:00
Sam Wu
b0931d5236 Update documentation requirements 2024-05-01 16:54:12 -06:00
Young Hui - AMD
2d25b75117 update SLES 15.4 prerequisites into 5.7.1 (#2979)
* update SLES 15.4 prerequisites

* match SLES prerequisite URLs

* update wordlist

* fix spelling
2024-04-01 17:49:36 -04:00
Sam Wu
6027186e46 docs(versions.md): Add 5.6.1 to versions list (#2825) 2024-01-22 15:17:20 -07:00
Mátyás Aradi
580e218d71 Fix installation links in ROCm 5.7.1 documentation (#2822)
* Fix ROCm repository link to keep version 5.7.1 instead of latest

* Fix RHEL versions
2024-01-18 09:30:05 -07:00
Sam Wu
f64d81bde2 docs(release-history.md): Add 5.7.1 to release history (#2665) 2023-11-29 15:45:40 -07:00
Saad Rahim (AMD)
f53b99990c 7900 XT support (#2661) 2023-11-21 09:44:49 -07:00
Sam Wu
eb6a4912ab Fix conflict when merging roc-5.7.x into docs/5.7.1 2023-10-25 13:16:39 -06:00
Sam Wu
db38fbd3a0 Merge roc-5.7.x updates into docs/5.7.1 branch (#2576)
* Update GPU Support on Linux (#2572)

Update docs with information in the AMD blog post announcing support for some RDNA3 Radeon GPUs on Linux.

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

* Making GPU and OS support page titles consistent between Win and Linux (#2575)

* Update LLVM ASan documentation (#2529)

---------

Co-authored-by: Houssem MENHOUR <husmen@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-10-25 13:08:30 -06:00
Sam Wu
549b23b521 Add Roopa's changes to gpu sanitizer doc (#2607)
* Add Roopa's changes to gpu sanitizer doc

* Markdown linting fixes
2023-10-25 13:02:28 -06:00
danpetreamd
37db70c914 fixed typo: correct path to direct rendering interface (DRI) devices is /dev/dri/renderD*. (#2593) 2023-10-24 10:11:00 -06:00
Jithun Nair
244c6a6823 Fix openmp documentation (#2598) 2023-10-23 13:03:54 -06:00
dsclear-amd
ce82a047bf Issue reporting templates roc 5.7.x (#2586)
* Adds GitHub issue templates for reporting problems, and feature requests.

* Adds issue reporting templates for logging bugs, and requesting features.

* Removed duplicate ISSUE_TEMPLATE directory.
2023-10-20 11:38:16 -06:00
Sam Wu
b61a54e4f3 Update LLVM ASan documentation (#2529) 2023-10-17 16:51:51 -06:00
Saad Rahim (AMD)
227e135f5a Making GPU and OS support page titles consistent between Win and Linux (#2575) 2023-10-17 16:51:14 -06:00
Houssem MENHOUR
1e9a1ca55a Update GPU Support on Linux (#2572)
Update docs with information in the AMD blog post announcing support for some RDNA3 Radeon GPUs on Linux.

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-10-17 16:13:05 -06:00
Saad Rahim (AMD)
20f3c28345 Fixing cut and paste for RDNA3 architecture of 7900 (#2574) 2023-10-17 11:34:49 -06:00
Saad Rahim (AMD)
ef93b5e176 Adding 7900 XTX and W7900 to compatibility matrix (#2573) 2023-10-17 11:16:41 -06:00
Saad Rahim (AMD)
72d4da7da0 Typo in graphical workstation setting (#2569) 2023-10-16 09:56:02 -06:00
Sam Wu
aac49cef23 Regenerate changelog with AMDMIGraphX (#2544) 2023-10-16 09:48:10 -06:00
Saad Rahim (AMD)
69b8117726 Fixing links to Radeon Software for Linux install (#2568) 2023-10-16 09:35:17 -06:00
Sam Wu
9ac4a7b194 Fix typo (#2567) 2023-10-16 09:34:29 -06:00
Saad Rahim (AMD)
00163edd45 radeon software for linux announcement (#2566) 2023-10-16 09:13:28 -06:00
Nara
80fd791421 Add Radeon install instructions for Linux (#2565) 2023-10-16 09:12:17 -06:00
Saad Rahim (AMD)
f65ab4ce27 Adding UB 22.04 container to docker support matrix (#2564) 2023-10-16 07:09:08 -06:00
Sam Wu
365b31728d Update doc reqs for 5.7.1 (#2558)
* Update doc reqs

rocm-docs-core==0.26.0

* Update release notes
2023-10-13 17:12:49 -06:00
Sam Wu
b6c71018a6 Disable epub format in rtd yaml config (#2557)
Because rubric is not supported

ValueError: <container: <rubric...><container...>> is not in list
2023-10-13 16:51:16 -06:00
Sam Wu
54177e8b96 Update rtd conf.py for 5.7.1 (#2556) 2023-10-13 16:41:19 -06:00
Saad Rahim (AMD)
74f4f86c92 5.7.1 Release Notes (#2550)
* 5.7.1 Release Notes

* Run script for 5.7.1 release notes

* Update CHANGELOG header

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-10-13 16:11:48 -06:00
Nara
74d8f95afb ROCm 5.7.1 Linux install and compatibility updates (#2547) 2023-10-13 15:16:14 -06:00
Saad Rahim (AMD)
50ad3847e5 Docker Image Support table updates (#2545) 2023-10-12 14:00:30 -06:00
Lisa
444efec642 Docker support updates (#2541) 2023-10-11 11:35:10 -06:00
Lisa
7d22b96c5d remove image (#2505) 2023-10-06 15:39:53 -06:00
Saad Rahim (AMD)
35122729b8 Release notes fix (#2513) 2023-09-28 09:24:16 -06:00
Sam Wu
c98da4a11a Remove extra line in package_manager_integration.md (#2508) 2023-09-27 16:01:22 -06:00
Saad Rahim (AMD)
14e0fae0fe Fix Changelog (#2501) 2023-09-26 11:05:18 -06:00
Sam Wu
13bea6bf4e disable spellcheck for license 2023-09-21 13:24:01 -06:00
Sam Wu
7a5f2eb508 add alt licensing for footer link 2023-09-21 13:14:52 -06:00
Sam Wu
fac4843569 Fixes for roc-5.7.x branch (#2486)
* Update Release Note Tables for 5.6.1 and 5.7.0 (#2478)

* add changelog table for 5.6.1

* update 5.7.0 changelog table

* specify svg size

* do not use xelatex

* set fontpkg

* fix typo in conf.py

* fix typo

* Update openmp.md

* rm 404 img
2023-09-20 11:49:47 -06:00
Nara
80d8eb84ef Fix incorrect LLVM target for RX 7600 in Windows Support page (#2483) 2023-09-20 07:04:05 -06:00
Saad Rahim (AMD)
c2a4257103 Feedback 5.7 (#2476)
* update relative link to llvm asan guide

remove docs dir from path

* Minor typo and update on supported OSes

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-18 15:25:52 -06:00
zhang2amd
fdc2f51b25 Update default.xml for 5.7 (#2471)
Update version to 5.7
Added a few new projects.
2023-09-15 18:12:30 -06:00
Sam Wu
23aa1eec20 Adjust 5.7.0 highlights (#2473)
* adjust 5.7.0 highlights

* adjust important highlights phrasing
2023-09-15 17:31:47 -06:00
Sam Wu
0bcf8c03e1 Small update to wording for release note reference to ASan user guide (#2470) 2023-09-15 17:09:32 -06:00
Sam Wu
a3b2bc3395 add announcement (#2472) 2023-09-15 17:09:12 -06:00
Saad Rahim (AMD)
5c07070e73 5.7 install instructions (#2467)
* Update install instructions to 5.7

* RTG additions to install instructions

* update install instructions for multi version

---------

Co-authored-by: Máté Ferenc Nagy-Egri <mate@streamhpc.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-15 11:56:23 -06:00
Sam Wu
c9630d82da HIP 5.7.0 Release Notes (#2468)
* add links to asan

* add HIP 5.7.0 release notes
2023-09-15 11:56:01 -06:00
Saad Rahim (AMD)
3974c5c1a1 Version bump in nav bar (#2465) 2023-09-15 10:32:47 -06:00
Saad Rahim (AMD)
3348de77d1 5.7 support tables (#2463) 2023-09-15 10:22:15 -06:00
Roopa Malavally
1ae743b22a Create 5.7.0.md (#2452)
* site restructure phase 1 - file reorganization (#2428)

* Update README.md (#2440)

Fix link to CHANGELOG.md

* Create 5.7.0.md

Release notes for ROCm 5.7.0

* Update 5.7.0.md

* Update 5.7.0.md

Added release highlights for ROCm v5.7

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* update markdown formatting 5.7.0.md and add links

* update RELEASE.md for 5.7.0

* add 5.7.0 release notes to CHANGELOG

* resolve rebase conflict

* Revert "site restructure phase 1 - file reorganization (#2428)"

This reverts commit d04797d1c8.

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Vishal Rao <vishalrao@gmail.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-15 09:05:09 -06:00
Nara
e8c2065d7c Added notes for incompatibilities with certain TensorFlow versions. (#2435)
* Added notes for incompatibilities with certain TensorFlow versions.

* Small improvements
2023-09-13 15:55:33 -06:00
Sam Wu
14402ad410 Release notes for 5.7.0 (#2374) 2023-09-13 15:55:00 -06:00
dependabot[bot]
3535c43d4e Bump rocm-docs-core from 0.23.0 to 0.24.0 in /docs/sphinx (#2438)
* Bump rocm-docs-core from 0.23.0 to 0.24.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update requirements.in

* Update requirements.txt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-07 16:27:43 -06:00
Paul R. C. Kent
75eed2ee3e Fix RHEL9 installer links (#2426)
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-09-06 11:23:01 -06:00
Saad Rahim (AMD)
0c3915923f Merge pull request #2434 from RadeonOpenCompute/merge-5.6.1
Merge 5.6.1 to develop
2023-09-06 11:16:52 -06:00
Saad Rahim (AMD)
d3049169de Merge branch 'develop' into merge-5.6.1 2023-09-05 16:19:10 -06:00
Sam Wu
6c0419fb0d Add hipSPARSELt and hipTensor to Projects and licenses (#2431)
* add hipsparselt

* add hiptensor to toc and licenses

* alphabetize licenses

* update rocm-docs-core to 0.23.0
2023-09-05 15:57:10 -06:00
srawat
996064950d OpenMP updates (#2404)
* Added deleted sections to openmp.md and other improvements

* Update CONTRIBUTING.md

* Update _toc.yml.in

* OpenMP updates for 5.7

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update CONTRIBUTING.md

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-01 17:28:32 -06:00
dependabot[bot]
77e2424f36 Bump rocm-docs-core from 0.21.0 to 0.22.0 in /docs/sphinx (#2427)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.21.0 to 0.22.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/v0.22.0/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.21.0...v0.22.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-31 17:15:33 -06:00
Sam Wu
62c0afd5ba add hiptensor to list of libs (#2414) 2023-08-31 14:18:57 -06:00
Roopa Malavally
d0953efad0 Update rocmcc.md (#2424)
Fixed https://ontrack-internal.amd.com/browse/SWDEV-407505?src=confmacro
2023-08-31 10:10:11 -06:00
searlmc1
f73d941657 Update using_gpu_sanitizer.md (#2423)
Update AMD supplied libs section
2023-08-31 09:33:12 -06:00
Máté Ferenc Nagy-Egri
ddbe4cd38f Update Linux install instructions for 5.6.1 2023-08-30 07:08:50 -06:00
Sam Wu
7e097ce72a Update conf.py 2023-08-29 17:04:47 -06:00
Saad Rahim
f3d3929f11 Updating version number to 5.6.1 2023-08-29 16:56:11 -06:00
Nara
084ed7f4cb docs: fix missing '--append' flag in install instructions (#2411) 2023-08-29 16:53:28 -06:00
Saad Rahim (AMD)
7482a8b261 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#2419) (#2420)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 16:08:48 -06:00
dependabot[bot]
f414c30836 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#2419)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 15:58:59 -06:00
Saad Rahim (AMD)
bf8f0ccc65 Updating the manifest file (#2417) 2023-08-29 15:07:13 -06:00
Sam Wu
ed8251872f 5.6.1 Release notes (#2416)
* 5.6.1 rel notes

* update rtd config
2023-08-29 15:04:53 -06:00
Sam Wu
8c01bfbb6e Change OpenMP Image Syntax and Update RTD config (#2400)
* update rtd config

* use standard markdown syntax for openmp svg

* fix rtd config
2023-08-25 10:47:32 -06:00
Lisa
b963f7fa05 404 updates (#2406)
add 404 page image

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-24 17:35:44 -06:00
Sam Wu
5b0d7bcebd fix RTD build failing on pdflatex and linting deadlock (#2398)
* docs(openmp.md): specify width and height for openmp toolchain svg

* fix linting
2023-08-23 10:54:28 -06:00
Saad Rahim
eef2937171 Merge pull request #2392 from RadeonOpenCompute/roc-5.6.x
Merging ROCm 5.6.x to develop
2023-08-21 16:27:40 -06:00
Sam Wu
52d59937d1 Update linting.yml 2023-08-21 16:17:59 -06:00
Sam Wu
ee72fbac97 Update linting.yml
remove roc**
to avoid triggering twice
2023-08-21 16:09:59 -06:00
Saad Rahim
5a33e54265 Removing duplicated concurency 2023-08-21 15:47:08 -06:00
Saad Rahim
ef248c087c Merge branch 'develop' into roc-5.6.x 2023-08-21 15:45:29 -06:00
Sam Wu
017d9717e0 build: concurrency for linting to prevent deadlock (#2394) 2023-08-21 15:44:51 -06:00
Saad Rahim
445432da13 Merge branch 'develop' into roc-5.6.x 2023-08-21 15:11:36 -06:00
Lisa
f6c439b56b Updating the What is ROCm page and related content (#2386) 2023-08-18 14:16:17 -06:00
Nara
c3e8e15e51 doc: Update version in install guide to 5.6 (#2387) 2023-08-18 13:57:45 -06:00
Nara
20ae555e61 doc: Update version in install guide to 5.6 (#2387) 2023-08-18 07:26:49 -06:00
Sam Wu
fa16caba4a Add License page (#2371)
* fix typo

* add license page

* move license in toc

* Update license.md

* improve phrasing for license

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-08-17 08:44:51 -06:00
Saad Rahim
7c6dede59d Window updates (#2365)
* Changing SKU to Edition

* Installation phrasing

* Adding the app deployment guide

* Fixing links

* Update docs/understand/windows-app-deployment-guidelines.md

---------

Co-authored-by: Sam Wu <sjwu@ualberta.ca>
2023-08-16 16:32:54 -06:00
Lisa
4813f1f37d language cleanup of ROCm docs (#2380)
* remove 'the'

* fix linking for GitHub Known Issues in nav tree

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2023-08-15 09:32:30 -06:00
Mátyás Aradi
261530f5f7 Fix caption typo for MI100 (#2375) 2023-08-10 08:44:45 -06:00
Roopa Malavally
d11c566fb2 Create using_gpu_sanitizer.md (#2338)
* Create using_gpu_sanitizer.md

* Created GPU Sanitizer File and Title

* add technical terms to wordlist and fix spelling

* spelling
---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: b-sumner <brian.sumner@amd.com>
2023-08-09 14:53:28 -06:00
Sam Wu
14153b9540 fix typos and add links to rocm-docs-core user and developer guides in contributing section (#2372) 2023-08-09 14:02:05 -06:00
dependabot[bot]
43601a0755 Bump certifi from 2022.12.7 to 2023.7.22 in /docs/sphinx (#2369)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-08 09:30:57 -06:00
dependabot[bot]
c3b2062c51 Bump pygments from 2.14.0 to 2.15.0 in /docs/sphinx (#2368)
Bumps [pygments](https://github.com/pygments/pygments) from 2.14.0 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.14.0...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-04 17:31:27 -06:00
dependabot[bot]
cced9a7955 Bump cryptography from 41.0.0 to 41.0.3 in /docs/sphinx (#2367)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.0 to 41.0.3.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.0...41.0.3)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-04 17:27:40 -06:00
Sam Wu
df0ee5a0ae add version to html title 2023-08-04 17:18:41 -06:00
srawat
3bfce9c570 corrected typo in contributing.md (#2334)
* Added deleted sections to openmp.md and other improvements

* Update CONTRIBUTING.md

* add example of snake case

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-04 12:46:13 -06:00
Sam Wu
45505e4912 ROCm Version page (#2331)
* add ROCm versions page

* add release dates from github tags

* fix versions list table

* fix dates

* update version page title
2023-08-01 12:09:50 -06:00
Nagy-Egri Máté Ferenc
d9376ebfc7 Use linting from rocm-docs-core (#2207)
* Linting from rocm-docs-core

* Give name to doc linting CI job

* Shorter job name
2023-07-31 10:52:45 -06:00
dependabot[bot]
31fcc9aafb Bump rocm-docs-core from 0.19.0 to 0.20.0 in /docs/sphinx (#2351)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-31 08:45:32 -06:00
Saad Rahim
6fb7b9f3b5 GPU support clarification (#2350) 2023-07-27 17:42:24 -06:00
Saad Rahim
bd553f263b GPU support clarification (#2350) 2023-07-27 17:41:41 -06:00
Saad Rahim
7f8eede7d1 linting fix 2023-07-27 16:30:18 -06:00
Saad Rahim
0741268fd5 Updating GPU support list 2023-07-27 16:30:18 -06:00
Saad Rahim
61dd65f29f Merge pull request #2349 from saadrahim/windows_additional_gpus
Windows additional GPUs
2023-07-27 16:26:30 -06:00
Saad Rahim
343693ed6f linting fix 2023-07-27 16:02:54 -06:00
Saad Rahim
3c27919a9c Updating GPU support list 2023-07-27 15:51:19 -06:00
Saad Rahim
ea1f2498f7 Merge remote-tracking branch 'origin/docs/5.6.0' into windows_additional_gpus 2023-07-27 15:38:43 -06:00
Sam Wu
4ab3787abe Merge pull request #2345 from RadeonOpenCompute/docs/5.5.1
Docs/5.5.1 Sync into 5.6
2023-07-27 13:32:02 -06:00
Saad Rahim
ebd44bb372 Merge pull request #2344 from RadeonOpenCompute/docs/5.6.0
Sync 5.6 branches
2023-07-27 13:20:39 -06:00
srawat
253f69b445 Adding openmp image (#2323)
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-07-25 11:05:09 -06:00
Sam Wu
5f546d44b3 Update Toolchain and Contributing Guides (#2315)
* spell out HPC acronym in explanation doc

* update toolchain docs

order in importance descending

* update Contributing guide

add discussions

update formatting and grammar

* separate contributing section for readability

* fix formatting for mdl

* fix spelling
2023-07-25 10:29:45 -06:00
dependabot[bot]
a9ae111741 Bump rocm-docs-core from 0.18.3 to 0.19.0 in /docs/sphinx (#2320)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.3 to 0.19.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.3...v0.19.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-12 09:29:05 -06:00
Edgar Gabriel
2721042eac gpu-aware MPI changes (#2311)
- simplify the configure arguments of UCX to only provide
flags absolutely required

- add the UCC compatibility matrix to the docs
2023-07-06 09:17:56 -06:00
Sam Wu
26935408e0 Add configurations for PDF output on Read the Docs (#2305)
* add configurations for pdf output on rtd

* set date for wip release notes

* add copyright to pdf
2023-07-04 21:29:31 -06:00
Sam Wu
372a257eed Changelog updates for 5.6.0 (#2306)
* remove typos in changelog

* add 5.6 release notes

* add amd smi changes for 5.6.0
2023-06-30 09:27:39 -06:00
Sam Wu
12bc633320 Links for Reference pages (#2307)
* reorg toc to match all ref material page

* add links to docs, github, and changelogs
2023-06-29 16:55:48 -06:00
Rahul Garg
c71d83207e Update backward incompatible planned changes in 5.5 (#2279)
* Update backward incompatible planned changes

* add planned changes to changelog

* update rocm-docs-core to v0.18.3

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-29 10:36:31 -06:00
Sam Wu
cd1ec676f0 fix or remove broken links (#2281) 2023-06-28 16:34:38 -06:00
dependabot[bot]
d2884f482a Bump rocm-docs-core from 0.18.1 to 0.18.2 in /docs/sphinx (#2293)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.1 to 0.18.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.1...v0.18.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 16:16:33 -06:00
dependabot[bot]
dce4d58348 Bump rocm-docs-core from 0.18.0 to 0.18.1 in /docs/sphinx (#2280)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 17:33:02 -06:00
dependabot[bot]
9eb46f8230 Bump rocm-docs-core from 0.17.2 to 0.18.0 in /docs/sphinx (#2278)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.2 to 0.18.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.2...v0.18.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 16:32:12 -06:00
srawat
73986668bb MI200 performance counters and OpenMP fixes 2023-06-27 08:17:35 -06:00
dependabot[bot]
6c179479f1 Bump rocm-docs-core from 0.17.1 to 0.17.2 in /docs/sphinx (#2276)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.1 to 0.17.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.1...v0.17.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 19:54:06 -06:00
dependabot[bot]
5b726ec96c Bump rocm-docs-core from 0.17.0 to 0.17.1 in /docs/sphinx (#2275)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.0 to 0.17.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.0...v0.17.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 16:37:42 -06:00
dependabot[bot]
e72f0dedde Bump rocm-docs-core from 0.16.0 to 0.17.0 in /docs/sphinx (#2273)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.16.0 to 0.17.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 15:35:54 -06:00
Ehud Sharlin
57e2253828 ROCm FHS Reorganization, Backward Compatibility, and Versioning - rev (#2255) 2023-06-26 14:07:02 -06:00
dependabot[bot]
233d3632b8 Bump rocm-docs-core from 0.15.0 to 0.16.0 in /docs/sphinx (#2262)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.15.0 to 0.16.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 21:37:05 -06:00
Sam Wu
bbfb18b5de fix rocm_smi_lib link in toc (#2260) 2023-06-21 20:22:48 -06:00
dependabot[bot]
66dd6c9467 Bump requests from 2.28.1 to 2.31.0 in /docs/sphinx (#2217)
Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.31.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.31.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 12:38:35 -06:00
dependabot[bot]
503809b74a Bump rocm-docs-core from 0.14.0 to 0.15.0 in /docs/sphinx (#2257)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.14.0 to 0.15.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 11:40:37 -06:00
srawat
9bc32154d8 Swati develop (#2245)
* Added deleted sections to openmp.md and other improvements

* Update openmp.md

Tagged `ICV`

* Solving indiscrepencies in openmp.md

There are apparently differences in the published document and information conveyed by the Dev. Fixed it.

* add new words to wordlist

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-20 10:52:55 -06:00
dependabot[bot]
0da29b73cb Bump rocm-docs-core from 0.13.4 to 0.14.0 in /docs/sphinx (#2249)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.4 to 0.14.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.4...v0.14.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-16 07:17:53 -06:00
dependabot[bot]
69580ef397 Bump cryptography from 40.0.2 to 41.0.0 in /docs/sphinx (#2218)
Bumps [cryptography](https://github.com/pyca/cryptography) from 40.0.2 to 41.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/40.0.2...41.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-14 16:46:26 -06:00
Saad Rahim
7762a8d874 Fixing HIP link (#2236) 2023-06-14 16:45:08 -06:00
Sam Wu
2ec3e537a4 Update Links (#2240)
* update link to PCIe Gen 4 pdf

* fix broken links

* remove references to broken links

* fix spelling of data center
2023-06-14 07:05:06 -06:00
71 changed files with 4633 additions and 2802 deletions

View File

@@ -0,0 +1,76 @@
name: Issue Report
description: File a report for something not working correctly.
title: "[Issue]: "
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to fill out this report!
On a Linux system, you can acquire your OS, CPU, GPU, and ROCm version (for filling out this report) with the following commands:
echo "OS:" && cat /etc/os-release | grep -E "^(NAME=|VERSION=)";
echo "CPU: " && cat /proc/cpuinfo | grep "model name" | sort --unique;
echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
echo "ROCm in /opt:" && ls -1 /opt | grep -E "rocm-";
- type: textarea
attributes:
label: Problem Description
description: Describe the issue you encountered.
placeholder: "The steps to reproduce can be included here, or in the dedicated section further below."
validations:
required: true
- type: input
attributes:
label: Operating System
description: What is the name and version number of the OS?
placeholder: "e.g. Ubuntu 22.04.3 LTS (Jammy Jellyfish)"
validations:
required: true
- type: input
attributes:
label: CPU
description: What CPU did you encounter the issue on?
placeholder: "e.g. AMD Ryzen 9 5900HX with Radeon Graphics"
validations:
required: true
- type: input
attributes:
label: GPU
description: What GPU(s) did you encounter the issue on?
placeholder: "e.g. MI200"
validations:
required: true
- type: input
attributes:
label: ROCm Version
description: What version(s) of ROCm did you encounter the issue on?
placeholder: "e.g. 5.7.0"
validations:
required: true
- type: input
attributes:
label: ROCm Component
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
placeholder: "e.g. rocBLAS"
- type: textarea
attributes:
label: Steps to Reproduce
description: (Optional) Detailed steps to reproduce the issue.
placeholder: Please also include what you expected to happen, and what actually did, at the failing step(s).
validations:
required: false
- type: textarea
attributes:
label: Output of /opt/rocm/bin/rocminfo --support
description: The output of rocminfo --support will help to better address the problem.
placeholder: |
ROCk module is loaded
=====================
HSA System Attributes
=====================
[...]
validations:
required: true

View File

@@ -0,0 +1,32 @@
name: Feature Suggestion
description: Suggest an additional functionality, or new way of handling an existing functionality.
title: "[Feature]: "
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to make a suggestion!
- type: textarea
attributes:
label: Suggestion Description
description: Describe your suggestion.
validations:
required: true
- type: input
attributes:
label: Operating System
description: (Optional) If this is for a specific OS, you can mention it here.
placeholder: "e.g. Ubuntu"
- type: input
attributes:
label: GPU
description: (Optional) If this is for a specific GPU or GPU family, you can mention it here.
placeholder: "e.g. MI200"
- type: input
attributes:
label: ROCm Component
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
placeholder: "e.g. rocBLAS"

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: ROCm Community Discussions
url: https://github.com/RadeonOpenCompute/ROCm/discussions
about: Please ask and answer questions here for anything ROCm.

View File

@@ -6,7 +6,7 @@ on:
- develop
- main
- 'docs/*'
- 'roc**'
- 'roc**'
pull_request:
branches:
- develop
@@ -14,47 +14,7 @@ on:
- 'docs/*'
- 'roc**'
concurrency:
group: ${{ github.ref }}-${{ github.workflow }}
cancel-in-progress: true
jobs:
lint-rest:
name: "RestructuredText"
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Install rst-lint
run: pip install restructuredtext-lint
- name: Lint ResT files
run: rst-lint ${{ join(github.workspace, '/docs') }}
lint-md:
name: "Markdown"
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Use markdownlint-cli2
uses: DavidAnson/markdownlint-cli2-action@v10.0.1
with:
globs: '**/*.md'
spelling:
name: "Spelling"
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Fetch config
shell: sh
run: |
curl --silent --show-error --fail --location https://raw.github.com/RadeonOpenCompute/rocm-docs-core/develop/.spellcheck.yaml -O
curl --silent --show-error --fail --location https://raw.github.com/RadeonOpenCompute/rocm-docs-core/develop/.wordlist.txt >> .wordlist.txt
- name: Run spellcheck
uses: rojopolis/spellcheck-github-actions@0.30.0
- name: On fail
if: failure()
run: |
echo "Please check for spelling mistakes or add them to '.wordlist.txt' in either the root of this project or in rocm-docs-core."
call-workflow-passing-data:
name: Documentation
uses: RadeonOpenCompute/rocm-docs-core/.github/workflows/linting.yml@develop

5
.gitignore vendored
View File

@@ -16,3 +16,8 @@ _readthedocs/
docs/contributing.md
docs/release.md
docs/CHANGELOG.md
# auto-generated files
docs/deploy/linux/installer/install.md
docs/deploy/linux/os-native/install.md
docs/deploy/linux/quick_start.md

View File

@@ -3,12 +3,19 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf, epub]
python:
version: "3.8"
install:
- requirements: docs/sphinx/requirements.txt
formats: []

View File

@@ -1,49 +1,686 @@
# file_reorg
FHS
Filesystem
filesystem
incrementing
rocm
# gpu_aware_mpi
DMA
GDR
HCA
MPI
MVAPICH
Mellanox's
NIC
OFED
OSU
OpenFabrics
PeerDirect
RDMA
UCX
ib_core
# isv_deployment_win
AAC
ABI
# linear algebra
LAPACK
MMA
backends
cuSOLVER
cuSPARSE
# openmp
ICV
Multithreaded
# tuning_guides
ACE
ACEs
AccVGPR
AccVGPRs
ALU
AMD
AMDGPU
AMDGPUs
AMDMIGraphX
AMI
AOCC
AOMP
APIC
APIs
APU
ASIC
ASICs
ASan
ASm
ATI
AddressSanitizer
AlexNet
Arb
BLAS
BMC
BitCode
Blit
Bluefield
CCD
CDNA
CIFAR
CLI
CLion
CMake
CMakeLists
CMakePackage
CP
CPC
CPF
CPP
CPU
CPUs
CSC
CSE
CSV
CSn
CTests
CU
CUDA
CUs
CXX
Cavium
CentOS
ChatGPT
CoRR
Codespaces
Commitizen
CommonMark
Concretized
Conda
ConnectX
DGEMM
DKMS
DL
DMA
DNN
DNNL
DPM
DRI
DW
DWORD
Dask
DataFrame
DataLoader
DataParallel
DeepSpeed
Dependabot
DevCap
Dockerfile
Doxygen
ELMo
ENDPGM
EPYC
ESXi
FFT
FFTs
FFmpeg
FHS
FMA
FP
Filesystem
Flang
Fortran
Fuyu
GALB
GCD
GCDs
GCN
GDB
GDDR
GDR
GDS
GEMM
GEMMs
GFortran
GiB
GIM
GL
GLXT
GMI
GPG
GPR
GPT
GPU
GPU's
GPUs
GRBM
GenAI
GenZ
GitHub
Gitpod
HBM
HCA
HIPCC
HIPExtension
HIPIFY
HPC
HPCG
HPE
HPL
HSA
HWE
Haswell
Higgs
Hyperparameters
ICV
IDE
IDEs
IMDb
IOMMU
IOP
IOPM
# windows
IOV
IRQ
ISA
ISV
ISVs
ImageNet
InfiniBand
Inlines
IntelliSense
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KiB
KVM
Keras
Khronos
LAPACK
LCLK
LDS
LLM
LLMs
LLVM
LM
LSAN
LTS
LoRA
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
MIVisionX
MLM
MMA
MMIO
MMIOH
MNIST
MPI
MSVC
MVAPICH
MVFFR
Makefile
Makefiles
Matplotlib
Megatron
Mellanox
Mellanox's
Meta's
MirroredStrategy
Multicore
Multithreaded
MyEnvironment
MyST
NBIO
NBIOs
NIC
NICs
NLI
NLP
NPS
NSP
NUMA
NVCC
NVIDIA
NVPTX
NaN
Nano
Navi
Noncoherently
NousResearch's
NumPy
OAM
OAMs
OCP
OEM
OFED
OMP
OMPI
OMPT
OMPX
ONNX
OSS
OSU
OpenCL
OpenCV
OpenFabrics
OpenGL
OpenMP
OpenSSL
OpenVX
PCI
PCIe
PEFT
PIL
PILImage
PRNG
PRs
PaLM
Pageable
PeerDirect
Perfetto
PipelineParallel
PnP
PowerShell
PyPi
PyTorch
Qcycles
RAII
RCCL
RDC
RDMA
RDNA
RHEL
ROC
ROCProfiler
ROCTracer
ROCclr
ROCdbgapi
ROCgdb
ROCk
ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCr
RST
RW
Radeon
RelWithDebInfo
Req
Rickle
RoCE
Ryzen
SALU
SBIOS
SCA
SDK
SDMA
SDRAM
SENDMSG
SGPR
SGPRs
SHA
SIGQUIT
SIMD
SIMDs
SKU
SKUs
PowerShell
SLES
SMEM
SMI
SMT
SPI
SQs
SRAM
SRAMECC
SVD
SWE
SerDes
Shlens
Skylake
Softmax
Spack
Supermicro
Szegedy
TCA
TCC
TCI
TCIU
TCP
TCR
TF
TFLOPS
TPU
TPUs
TensorBoard
TensorFlow
TensorParallel
ToC
TorchAudio
TorchMIGraphX
TorchScript
TorchServe
TorchVision
TransferBench
TrapStatus
UAC
# pytorch_install
kdb
precompiled
# gpu_os_support
HWE
UC
UCC
UCX
UIF
USM
UTCL
UTIL
Uncached
Unhandled
VALU
VBIOS
VGPR
VGPRs
VM
VMEM
VMWare
VRAM
VSIX
VSkipped
Vanhoucke
Vulkan
WGP
WGPs
WX
WikiText
Wojna
Workgroups
Writebacks
XCD
XCDs
XGBoost
XGBoost's
XGMI
XT
XTX
Xeon
Xilinx
Xnack
Xteam
YAML
YML
YModel
ZeRO
ZenDNN
accuracies
activations
addr
alloc
allocator
allocators
amdgpu
api
atmi
atomics
autogenerated
avx
awk
backend
backends
benchmarking
bfloat
bilinear
bitsandbytes
blit
boson
bosons
buildable
bursty
bzip
cacheable
cd
centos
centric
changelog
chiplet
cmake
cmd
coalescable
codename
collater
comgr
completers
composable
concretization
config
conformant
convolutional
convolves
cpp
csn
cuBLAS
cuFFT
cuLIB
cuRAND
cuSOLVER
cuSPARSE
dataset
datasets
dataspace
datatype
datatypes
dbgapi
de
deallocation
denoise
denoised
denoises
denormalize
deserializers
detections
dev
devicelibs
devsel
dimensionality
disambiguates
distro
el
embeddings
enablement
endpgm
encodings
env
epilog
etcetera
ethernet
exascale
executables
ffmpeg
filesystem
fortran
galb
gcc
gdb
gfortran
gfx
githooks
github
gnupg
grayscale
gzip
heterogenous
hipBLAS
hipBLASLt
hipCUB
hipFFT
hipLIB
hipRAND
hipSOLVER
hipSPARSE
hipSPARSELt
hipTensor
hipamd
hipblas
hipcub
hipfft
hipfort
hipify
hipsolver
hipsparse
hpp
hsa
hsakmt
hyperparameter
ib_core
inband
incrementing
inferencing
inflight
init
initializer
inlining
installable
interprocedural
intra
invariants
invocating
ipo
kdb
latencies
libfabric
libjpeg
libs
linearized
linter
linux
llvm
localscratch
logits
lossy
macOS
matchers
microarchitecture
migraphx
miopen
miopengemm
mivisionx
mkdir
mlirmiopen
mtypes
mvffr
namespace
namespaces
numref
ocl
opencl
opencv
openmp
openssl
optimizers
os
pageable
parallelization
parameterization
passthrough
perfcounter
performant
perl
pragma
pre
prebuilt
precompiled
prefetch
prefetchable
preprocess
preprocessed
preprocessing
prequantized
prerequisites
profiler
protobuf
pseudorandom
py
quasirandom
queueing
rccl
rdc
reStructuredText
reformats
repos
representativeness
req
resampling
rescaling
reusability
roadmap
roc
rocAL
rocALUTION
rocBLAS
rocFFT
rocLIB
rocMLIR
rocPRIM
rocRAND
rocSOLVER
rocSPARSE
rocThrust
rocWMMA
rocalution
rocblas
rocclr
rocfft
rocm
rocminfo
rocprim
rocprof
rocprofiler
rocr
rocrand
rocsolver
rocsparse
rocthrust
roctracer
runtime
runtimes
sL
scalability
scalable
sendmsg
serializers
shader
sharding
sigmoid
sm
smi
softmax
spack
src
stochastically
strided
subdirectory
subexpression
subfolder
subfolders
supercomputing
tensorfloat
th
tokenization
tokenize
tokenized
tokenizer
tokenizes
toolchain
toolchains
toolset
toolsets
torchvision
tqdm
tracebacks
txt
uarch
uncached
uncorrectable
uninstallation
unsqueeze
unstacking
unswitching
untrusted
untuned
upvote
USM
UTCL
UTIL
utils
vL
variational
vdi
vectorizable
vectorization
vectorize
vectorized
vectorizer
vectorizes
vjxb
walkthrough
walkthroughs
wavefront
wavefronts
whitespaces
workgroup
workgroups
writeback
writebacks
wrreq
wzo
xargs
xz
yaml
ysvmadyb
zypper

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,7 @@
AMD values and encourages the ROCm community to contribute to our code and
documentation. This repository is focused on ROCm documentation and this
contribution guide describes the recommend method for creating and modifying our
contribution guide describes the recommended method for creating and modifying our
documentation.
While interacting with ROCm Documentation, we encourage you to be polite and
@@ -13,59 +13,47 @@ itself, refer to
[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) on the
GitHub repository.
For additional information on documentation functionalities,
see the user and developer guides for rocm-docs-core
at {doc}`rocm-docs-core documentation <rocm-docs-core:index>`.
## Supported Formats
Our documentation includes both markdown and rst files. Markdown is encouraged
over rst due to the lower barrier to participation. GitHub flavored markdown is preferred
for all submissions as it will render accurately on our GitHub repositories. For existing documentation,
[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html) markdown
is used to implement certain features unsupported in GitHub markdown. This is
Our documentation includes both Markdown and RST files. Markdown is encouraged
over RST due to the lower barrier to participation. GitHub-flavored Markdown is preferred
for all submissions as it renders accurately on our GitHub repositories. For existing documentation,
[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html) Markdown
is used to implement certain features unsupported in GitHub Markdown. This is
not encouraged for new documentation. AMD will transition
to stricter use of GitHub flavored markdown with a few caveats. ROCm documentation
also uses [sphinx-design](https://sphinx-design.readthedocs.io/en/latest/index.html)
in our markdown and rst files. We also will use breathe syntax for doxygen documentation
in our markdown files. Other design elements for effective HTML rendering of the documents
may be added to our markdown files. Please see
to stricter use of GitHub-flavored Markdown with a few caveats. ROCm documentation
also uses [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html)
in our Markdown and RST files. We also use Breathe syntax for Doxygen documentation
in our Markdown files. See
[GitHub](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github)'s
guide on writing and formatting on GitHub as a starting point.
ROCm documentation adds additional requirements to markdown and rst based files
ROCm documentation adds additional requirements to Markdown and RST based files
as follows:
- Level one headers are only used for page titles. There must be only one level
1 header per file for both Markdown and Restructured Text.
- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
our automated github action on a Pull Request (PR).
our automated GitHub action on a Pull Request (PR).
See the {doc}`rocm-docs-core linting user guide <rocm-docs-core:user_guide/linting>` for more details.
## Filenames and folder structure
Please use snake case for file names. Our documentation follows pitchfork for
folder structure. All documentation is in /docs except for special files like
the contributing guide in the / folder. All images used in the documentation are
place in the /docs/data folder.
## How to provide feedback for for ROCm documentation
There are three standard ways to provide feedback for this repository.
### Pull Request
All contributions to ROCm documentation should arrive via the
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
targetting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us. TODO, confirm email address.
### GitHub Issue
Issues on existing or absent docs can be filed as [GitHub issues
](https://github.com/RadeonOpenCompute/ROCm/issues).
### Email Feedback
Please use snake case (all lower case letters and underscores instead of spaces)
for file names. For example, `example_file_name.md`.
Our documentation follows Pitchfork for folder structure.
All documentation is in `/docs` except for special files like
the contributing guide in the `/` folder. All images used in the documentation are
placed in the `/docs/data` folder.
## Language and Style
Adopting Microsoft CPP-Docs guidelines for [Voice and Tone
](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
Adopt Microsoft CPP-Docs guidelines for
[Voice and Tone](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
ROCm documentation templates to be made public shortly. ROCm templates dictate
the recommended structure and flow of the documentation. Guidelines on how to
@@ -73,174 +61,11 @@ integrate figures, equations, and tables are all based off
[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html).
Font size and selection, page layout, white space control, and other formatting
details are controlled via rocm-docs-core, sphinx extention. Please raise issues
in rocm-docs-core for any formatting concerns and changes requested.
details are controlled via [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
Raise issues in `rocm-docs-core` for any formatting concerns and changes requested.
## Building Documentation
## More
While contributing, one may build the documentation locally on the command-line
or rely on Continuous Integration for previewing the resulting HTML pages in a
browser.
### Command line documentation builds
Python versions known to build documentation:
- 3.8
To build the docs locally using Python Virtual Environment (`venv`), execute the
following commands from the project root:
```sh
python3 -mvenv .venv
# Windows
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
# Linux
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
```
Then open up `_build/html/index.html` in your favorite browser.
### Pull Requests documentation builds
When opening a PR to the `develop` branch on GitHub, the page corresponding to
the PR (`https://github.com/RadeonOpenCompute/ROCm/pull/<pr_number>`) will have
a summary at the bottom. This requires the user be logged in to GitHub.
- There, click `Show all checks` and `Details` of the Read the Docs pipeline. It
will take you to `https://readthedocs.com/projects/advanced-micro-devices-rocm/
builds/<some_build_num>/`
- The list of commands shown are the exact ones used by CI to produce a render
of the documentation.
- There, click on the small blue link `View docs` (which is not the same as the
bigger button with the same text). It will take you to the built HTML site with
a URL of the form `https://
advanced-micro-devices-demo--<pr_number>.com.readthedocs.build/projects/alpha/en
/<pr_number>/`.
### Build the docs using VS Code
One can put together a productive environment to author documentation and also
test it locally using VS Code with only a handful of extensions. Even though the
extension landscape of VS Code is ever changing, here is one example setup that
proved useful at the time of writing. In it, one can change/add content, build a
new version of the docs using a single VS Code Task (or hotkey), see all errors/
warnings emitted by Sphinx in the Problems pane and immediately see the
resulting website show up on a locally serving web server.
#### Configuring VS Code
1. Install the following extensions:
- Python (ms-python.python)
- Live Server (ritwickdey.LiveServer)
2. Add the following entries in `.vscode/settings.json`
```json
{
"liveServer.settings.root": "/.vscode/build/html",
"liveServer.settings.wait": 1000,
"python.terminal.activateEnvInCurrentTerminal": true
}
```
The settings in order are set for the following reasons:
- Sets the root of the output website for live previews. Must be changed
alongside the `tasks.json` command.
- Tells live server to wait with the update to give time for Sphinx to
regenerate site contents and not refresh before all is don. (Empirical value)
- Automatic virtual env activation is a nice touch, should you want to build
the site from the integrated terminal.
3. Add the following tasks in `.vscode/tasks.json`
```json
{
"version": "2.0.0",
"tasks": [
{
"label": "Build Docs",
"type": "process",
"windows": {
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
},
"command": "${workspaceFolder}/.venv/bin/python3",
"args": [
"-m",
"sphinx",
"-j",
"auto",
"-T",
"-b",
"html",
"-d",
"${workspaceFolder}/.vscode/build/doctrees",
"-D",
"language=en",
"${workspaceFolder}/docs",
"${workspaceFolder}/.vscode/build/html"
],
"problemMatcher": [
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"line": 2,
"severity": 3,
"message": 4
},
},
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"severity": 2,
"message": 3
}
}
],
"group": {
"kind": "build",
"isDefault": true
}
},
],
}
```
> (Implementation detail: two problem matchers were needed to be defined,
> because VS Code doesn't tolerate some problem information being potentially
> absent. While a single regex could match all types of errors, if a capture
> group remains empty (the line number doesn't show up in all warning/error
> messages) but the `pattern` references said empty capture group, VS Code
> discards the message completely.)
4. Configure Python virtual environment (venv)
- From the Command Palette, run `Python: Create Environment`
- Select `venv` environment and the `docs/sphinx/requirements.txt` file.
_(Simply pressing enter while hovering over the file from the dropdown is
insufficient, one has to select the radio button with the 'Space' key if
using the keyboard.)_
5. Build the docs
- Launch the default build Task using either:
- a hotkey _(default is 'Ctrl+Shift+B')_ or
- by issuing the `Tasks: Run Build Task` from the Command Palette.
6. Open the live preview
- Navigate to the output of the site within VS Code, right-click on
`.vscode/build/html/index.html` and select `Open with Live Server`. The
contents should update on every rebuild without having to refresh the
browser.
<!-- markdownlint-restore -->
For more topics, such as submitting feedback and ways to build documentation,
see the [Contributing Section](https://rocm.docs.amd.com/en/latest/contributing.html)
at [rocm.docs.amd.com](https://rocm.docs.amd.com)

View File

@@ -1,42 +1,38 @@
# AMD ROCm™ Platform
ROCm is an open-source stack for GPU computation. ROCm is primarily Open-Source
Software (OSS) that allows developers the freedom to customize and tailor their
GPU software for their own needs while collaborating with a community of other
developers, and helping each other find solutions in an agile, flexible, rapid
and secure manner.
ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for
graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development
tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
ROCm is a collection of drivers, development tools and APIs enabling GPU
programming from the low-level kernel to end-user applications. ROCm is powered
by AMDs Heterogeneous-computing Interface for Portability (HIP), an OSS C++ GPU
programming environment and its corresponding runtime. HIP allows ROCm
developers to create portable applications on different platforms by deploying
code on a range of platforms, from dedicated gaming GPUs to exascale HPC
clusters. ROCm supports programming models such as OpenMP and OpenCL, and
includes all the necessary OSS compilers, debuggers and libraries. ROCm is fully
integrated into ML frameworks such as PyTorch and TensorFlow. ROCm can be
deployed in many ways, including through the use of containers such as Docker,
Spack, and your own build from source.
With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
ROCms goal is to allow our users to maximize their GPU hardware investment.
ROCm is designed to help develop, test and deploy GPU accelerated HPC, AI,
scientific computing, CAD, and other applications in a free, open-source,
integrated and secure software ecosystem.
ROCm is powered by AMDs
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
an OSS C++ GPU programming environment and its corresponding runtime. HIP allows ROCm
developers to create portable applications on different platforms by deploying code on a range of
platforms, from dedicated gaming GPUs to exascale HPC clusters.
This repository contains the manifest file for ROCm™ releases, changelogs, and
release information. The file default.xml contains information for all
repositories and the associated commit used to build the current ROCm release.
The default.xml file uses the repo Manifest format.
The develop branch of this repository contains content for the next
ROCm release.
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary OSS
compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks,
such as PyTorch and TensorFlow.
## ROCm Documentation
ROCm Documentation is available online at
[rocm.docs.amd.com](https://rocm.docs.amd.com). Source code for the documenation
is located in the docs folder of most repositories that are part of ROCm.
The ROCm Documentation site is [rocm.docs.amd.com](https://rocm.docs.amd.com).
Source code for the documentation is located in the docs folder of most repositories that are part of
ROCm.
This repository contains the manifest file for ROCm releases, changelogs, and release information.
The file `default.xml` contains information for all repositories and the associated commit used to build
the current ROCm release.
The `default.xml` file uses the repo Manifest Format.
The develop branch of this repository contains content for the next ROCm release.
### How to build documentation via Sphinx
@@ -48,7 +44,7 @@ pip3 install -r sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
## Older ROCm Releases
## Older ROCm Releases
For release information for older ROCm releases, refer to
[CHANGELOG](./CHANGELOG.md).
For release information for older ROCm releases, refer to
[`CHANGELOG`](./CHANGELOG.md).

View File

@@ -15,568 +15,63 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.6.0
## ROCm 5.7.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
<!-- markdownlint-disable header-increment -->
#### Release Highlights
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base.A few examples include:
### What's New in This Release
- New documentation portal at https://rocm.docs.amd.com
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
### ROCm Libraries
#### OS and GPU Support Changes
#### rocBLAS
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
- Bug fixes during the maintenance will be made to the next ROCm point release
- Bug fixes will not be back ported to older ROCm releases for this SKU
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
#### AMDSMI CLI 23.0.0.4
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
##### Added
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
- Package: amd-smi-lib
##### Known Issues
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
- not all Error Correction Code (ECC) fields are currently supported
#### HIP 5.7.1 (for ROCm 5.7.1)
- RHEL 8 & SLES 15 have extra install steps
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
#### Kernel Modules (DKMS)
### Fixed defects
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.
##### Fixes
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
#### HIP 5.6 (For ROCm 5.6)
##### Optimizations
- Consolidation of hipamd, rocclr and OpenCL projects in clr
- Optimized lock for graph global capture mode
##### Added
- Added hipRTC support for amd_hip_fp16
- Added hipStreamGetDevice implementation to get the device associated with the stream
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
- hipArrayGetInfo for getting information about the specified array
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
- hipArray3DGetDescriptor to get 3D array descriptor
##### Changed
- hipMallocAsync to return success for zero size allocation to match hipMalloc
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
- Removed hipBusBandwidth and hipCommander samples from hip-tests
##### Fixed
- Fixed regression in hipMemCpyParam3D when offset is applied
##### Known Issues
- Limited testing on xnack+ configuration
- Multiple HIP tests failures (gpuvm fault or hangs)
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
##### Upcoming changes in future release
- Removal of gcnarch from hipDeviceProp_t structure
- Addition of new fields in hipDeviceProp_t structure
- maxTexture1D
- maxTexture2D
- maxTexture1DLayered
- maxTexture2DLayered
- sharedMemPerMultiprocessor
- deviceOverlap
- asyncEngineCount
- surfaceAlignment
- unifiedAddressing
- computePreemptionSupported
- uuid
- Removal of deprecated code
- hip-hcc codes from hip code tree
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
#### ROCgdb-13 (For ROCm 5.6.0)
##### Optimized
- Improved performances when handling the end of a process with a large number of threads.
Known Issues
- On certain configurations, ROCgdb can show the following warning message:
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
This does not affect ROCgdb's functionalities.
#### ROCprofiler (For ROCm 5.6.0)
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
ROCm 5.5 are split into separate files. The `rocmtools` files that were
deprecated in ROCm 5.5 have been removed.
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|-----------------|-------------------------------------|----------------------------------------|
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
following command:
```sh
$ rocprof …
```
To write a custom tool based on the `rocprofilerV1` API do the following:
```C
main.c:
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
int main() {
// Use the rocprofilerV1 API
return 0;
}
```
This can be built in the following manner:
```sh
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
```
The resulting `a.out` will depend on
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
following command:
```sh
$ rocprofv2 …
```
To write a custom tool based on the `rocprofilerV2` API do the following:
```C
main.c:
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
int main() {
// Use the rocprofilerV2 API
return 0;
}
```
This can be built in the following manner:
```sh
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
```
The resulting `a.out` will depend on
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
##### Optimized
- Improved Test Suite
##### Added
- 'end_time' need to be disabled in roctx_trace.txt
##### Fixed
- rocprof in ROcm/5.4.0 gpu selector broken.
- rocprof in ROCm/5.4.1 fails to generate kernel info.
- rocprof clobbers LD_PRELOAD.
### Library Changes in ROCM 5.6.0
### Library Changes in ROCM 5.7.1
| Library | Version |
|---------|---------|
| hipBLAS | [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
| hipSPARSE | [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
| hipBLAS | [1.1.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.7.1) |
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.7.1) |
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.7.1) |
| hipSOLVER | 1.8.1 ⇒ [1.8.2](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.7.1) |
| hipSPARSE | [2.3.8](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.7.1) |
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.7.1) |
| rocALUTION | [2.1.11](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.7.1) |
| rocBLAS | [3.1.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.7.1) |
| rocFFT | [1.0.24](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.7.1) |
| rocm-cmake | [0.10.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.7.1) |
| rocPRIM | [2.13.1](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.7.1) |
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.7.1) |
| rocSOLVER | [3.23.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.7.1) |
| rocSPARSE | [2.5.4](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.7.1) |
| rocThrust | [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.7.1) |
| rocWMMA | [1.2.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.7.1) |
| Tensile | [4.38.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.7.1) |
#### hipBLAS 1.0.0
#### hipSOLVER 1.8.2
hipBLAS 1.0.0 for ROCm 5.6.0
##### Changed
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
##### Removed
- removed support for deprecated hipblasInt8Datatype_t enum
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
##### Deprecated
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
out-of-place functionality
#### hipCUB 2.13.1
hipCUB 2.13.1 for ROCm 5.6.0
##### Added
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
##### Changed
- CUB backend references CUB and Thrust version 1.17.2.
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
##### Known Issues
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
#### hipFFT 1.0.12
hipFFT 1.0.12 for ROCm 5.6.0
##### Added
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
##### Changed
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
#### hipSOLVER 1.8.0
hipSOLVER 1.8.0 for ROCm 5.6.0
##### Added
- Added compatibility API with hipsolverRf prefix
#### hipSPARSE 2.3.6
hipSPARSE 2.3.6 for ROCm 5.6.0
##### Added
- Added SpGEMM algorithms
##### Changed
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
#### MIOpen 2.19.0
MIOpen 2.19.0 for ROCm 5.6.0
##### Added
- ROCm 5.5 support for gfx1101 (Navi32)
##### Changed
- Tuning results for MLIR on ROCm 5.5
- Bumping MLIR commit to 5.5.0 release tag
hipSOLVER 1.8.2 for ROCm 5.7.1
##### Fixed
- Fix 3d convolution Host API bug
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
#### rccl 2.15.5
RCCL 2.15.5 for ROCm 5.6.0
##### Changed
- Compatibility with NCCL 2.15.5
- Unit test executable renamed to rccl-UnitTests
##### Added
- HW-topology aware binary tree implementation
- Experimental support for MSCCL
- New unit tests for hipGraph support
- NPKit integration
##### Fixed
- rocm-smi ID conversion
- Support for HIP_VISIBLE_DEVICES for unit tests
- Support for p2p transfers to non (HIP) visible devices
##### Removed
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
#### rocALUTION 2.1.9
rocALUTION 2.1.9 for ROCm 5.6.0
##### Improved
- Fixed synchronization issues in level 1 routines
#### rocBLAS 3.0.0
rocBLAS 3.0.0 for ROCm 5.6.0
##### Optimizations
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n &lt;= 32 and batch_count &gt;= 256.
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
##### Added
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
##### Deprecated
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
##### Removed
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
##### Dependencies
- build only dependency on python joblib added as used by Tensile build
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
##### Fixed
- make trsm offset calculations 64 bit safe
##### Changed
- refactor rotg test code
#### rocFFT 1.0.23
rocFFT 1.0.23 for ROCm 5.6.0
##### Added
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
##### Changed
- Replaced std::complex with hipComplex data types for data generator.
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
##### Fixed
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
#### rocm-cmake 0.9.0
rocm-cmake 0.9.0 for ROCm 5.6.0
##### Added
- Added the option ROCM_HEADER_WRAPPER_WERROR
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
- Configure-time CMake option sets the default for the C macro.
#### rocPRIM 2.13.0
rocPRIM 2.13.0 for ROCm 5.6.0
##### Added
- New block level `radix_rank` primitive.
- New block level `radix_rank_match` primitive.
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
##### Changed
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
- Improved the performance of `device_merge_sort`.
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
##### Known Issues
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits&lt;unsigned int&gt;::max()`.
#### rocRAND 2.10.17
rocRAND 2.10.17 for ROCm 5.6.0
##### Added
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
- experimental HIP-CPU feature
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, &#34;Parallel random numbers: as easy as 1, 2, 3&#34;.
##### Changed
- Python 2.7 is no longer officially supported.
#### rocSOLVER 3.22.0
rocSOLVER 3.22.0 for ROCm 5.6.0
##### Added
- LU refactorization for sparse matrices
- CSRRF_ANALYSIS
- CSRRF_SUMLU
- CSRRF_SPLITLU
- CSRRF_REFACTLU
- Linear system solver for sparse matrices
- CSRRF_SOLVE
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
##### Optimized
- Improved the performance of BDSQR and GESVD when singular vectors are requested
##### Fixed
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
#### rocSPARSE 2.5.2
rocSPARSE 2.5.2 for ROCm 5.6.0
##### Improved
- Fixed a memory leak in csritsv
- Fixed a bug in csrsm and bsrsm
#### rocThrust 2.18.0
rocThrust 2.18.0 for ROCm 5.6.0
##### Fixed
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
##### Changed
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
#### rocWMMA 1.1.0
rocWMMA 1.1.0 for ROCm 5.6.0
##### Added
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
- Added performance gemm samples for half, single and double precision
- Added rocWMMA cmake versioning
- Added vectorized support in coordinate transforms
- Included ROCm smi for runtime clock rate detection
- Added fragment transforms for transpose and change data layout
##### Changed
- Default to GPU rocBLAS validation against rocWMMA
- Re-enabled int8 gemm tests on gfx9
- Upgraded to C++17
- Restructured unit test folder for consistency
- Consolidated rocWMMA samples common code
#### Tensile 4.37.0
Tensile 4.37.0 for ROCm 5.6.0
##### Added
- Added user driven tuning API
- Added decision tree fallback feature
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
- DirectToVgpr support for fp16 and Int8 with TN orientation
- Added new test cases for various functions
- Added SingleBuffer algorithm for ZGEMM/CGEMM
- Added joblib for parallel map calls
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
- Added asmcap check for MIArchVgpr
- Added support for MFMA + LocalSplitU
- Added frequency, power, and temperature data to the output
##### Optimizations
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
- Reduced the running time of the extended and pre_checkin tests
- Optimized the Tailloop section of the assembly kernel
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
- Improved the performance of the second kernel of MultipleBuffer algorithm
##### Changed
- Updated custom kernels with 64-bit offsets
- Adapted 64-bit offset arguments for assembly kernels
- Improved temporary register re-use to reduce max sgpr usage
- Removed some restrictions on VectorWidth and DirectToVgpr
- Updated the dependency requirements for Tensile
- Changed the range of AssertSummationElementMultiple
- Modified the error messages for more clarity
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
- Removed dummy vgpr for vectorStaticRemainder
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
- Removed qReg parameter from vectorStaticRemainder
##### Fixed
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
- 64-bit offset parameters for post kernels
- Fixed gfx908 CI test failures
- Fixed offset calculation to prevent overflow for large offsets
- Fixed issues when BufferLoad and BufferStore are equal to zero
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
- Fixed the memory access error related to StaggerU + large stride
- Fixed ZGEMM 4x4 MatrixInst mismatch
- Fixed DGEMM 4x4 MatrixInst mismatch
- Fixed ASEM + GSU + NoTailLoop opt mismatch
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
- Fixed ASEM + GSU + TailLoop inner unroll
- Fixed conflicts between the hipsolver-dev and -asan packages by excluding
hipsolver_module.f90 from the latter

View File

@@ -12,43 +12,44 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup"
fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-5.6.0"
<default revision="refs/tags/rocm-5.7.1"
remote="roc-github"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCM-->
<project name="ROCK-Kernel-Driver" remote="roc-github" />
<project name="ROCT-Thunk-Interface" remote="roc-github" />
<project name="ROCR-Runtime" remote="roc-github" />
<project name="rocm_smi_lib" remote="roc-github" />
<project name="rocm-core" remote="roc-github" />
<project name="rocm-cmake" remote="roc-github" />
<project name="rocminfo" remote="roc-github" />
<project name="ROCK-Kernel-Driver" />
<project name="ROCT-Thunk-Interface" />
<project name="ROCR-Runtime" />
<project name="amdsmi" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocm-cmake" />
<project name="rocminfo" />
<project name="rocm_bandwidth_test" />
<project name="rocprofiler" remote="rocm-devtools" />
<project name="roctracer" remote="rocm-devtools" />
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" revision="6c03f8b58fafd9dd693eaac826749a5cfad515f8" />
<project name="clang-ocl" remote="roc-github" />
<project name="clang-ocl" />
<project name="rdc" />
<!--HIP Projects-->
<project name="HIP" remote="rocm-devtools" />
<project name="clr" remote="rocm-devtools" />
<project name="HIP-Examples" remote="rocm-devtools" />
<project name="clr" remote="rocm-devtools" />
<project name="HIPIFY" remote="rocm-devtools" />
<project name="HIPCC" remote="rocm-devtools" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="llvm-project" remote="roc-github" />
<project name="ROCm-Device-Libs" remote="roc-github" />
<project name="ROCm-CompilerSupport" remote="roc-github" />
<project name="rocr_debug_agent" remote="rocm-devtools" />
<project name="rocm_bandwidth_test" remote="roc-github" />
<project name="llvm-project" />
<project name="ROCm-Device-Libs" />
<project name="ROCm-CompilerSupport" />
<project name="half" remote="rocm-swplat" revision="37742ce15b76b44e4b271c1e66d13d2fa7bd003e" />
<project name="RCP" remote="gpuopen-tools" revision="3a49405a1500067c49d181844ec90aea606055bb" />
<!-- gdb projects -->
<project name="ROCgdb" remote="rocm-devtools" />
<project name="ROCdbgapi" remote="rocm-devtools" />
<project name="rocr_debug_agent" remote="rocm-devtools" />
<!-- ROCm Libraries -->
<project name="rdc" remote="roc-github" />
<project groups="mathlibs" name="rocBLAS" remote="rocm-swplat" />
<project groups="mathlibs" name="Tensile" remote="rocm-swplat" />
<project groups="mathlibs" name="hipTensor" remote="rocm-swplat" />
<project groups="mathlibs" name="hipBLAS" remote="rocm-swplat" />
<project groups="mathlibs" name="rocFFT" remote="rocm-swplat" />
<project groups="mathlibs" name="hipFFT" remote="rocm-swplat" />
@@ -58,13 +59,16 @@ fetch="https://github.com/KhronosGroup/" />
<project groups="mathlibs" name="hipSOLVER" remote="rocm-swplat" />
<project groups="mathlibs" name="hipSPARSE" remote="rocm-swplat" />
<project groups="mathlibs" name="rocALUTION" remote="rocm-swplat" />
<project name="MIOpen" remote="rocm-swplat" />
<project groups="mathlibs" name="rccl" remote="rocm-swplat" />
<project name="MIVisionX" remote="gpuopen-libs" />
<project groups="mathlibs" name="rocThrust" remote="rocm-swplat" />
<project groups="mathlibs" name="hipCUB" remote="rocm-swplat" />
<project groups="mathlibs" name="rocPRIM" remote="rocm-swplat" />
<project groups="mathlibs" name="rocWMMA" remote="rocm-swplat" />
<project groups="mathlibs" name="rccl" remote="rocm-swplat" />
<project name="rocMLIR" remote="rocm-swplat" />
<project name="MIOpen" remote="rocm-swplat" />
<project name="composable_kernel" remote="rocm-swplat" />
<project name="MIVisionX" remote="gpuopen-libs" />
<project name="rpp" remote="gpuopen-libs" />
<project name="hipfort" remote="rocm-swplat" />
<project name="AMDMIGraphX" remote="rocm-swplat" />
<project name="ROCmValidationSuite" remote="rocm-devtools" />

View File

@@ -1,6 +0,0 @@
# 404 Page Not Found
Page could not be found.
Return to [home](./index) or please use the links from the sidebar to find what
you are looking for.

View File

@@ -5,70 +5,70 @@ Documentation is built using open source toolchains. Contributions to our
documentation is encouraged and welcome. As a contributor, please familiarize
yourself with our documentation toolchain.
## ReadTheDocs
## `rocm-docs-core`
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our front end for the
our documentation. By front end, this is the tool that serves our HTML based
documentation to our end users.
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD-maintained
project that applies customization for our documentation. This
project is the tool most ROCm repositories use as part of the documentation
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
## Doxygen
[Doxygen](https://www.doxygen.nl/) is the most common inline code documentation
standard. ROCm projects are use Doxygen for public API documentation (unless the
upstream project is using a different tool).
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation <rocm-docs-core:index>`.
## Sphinx
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
originally used for python. It is now widely used in the Open Source community.
Originally, sphinx supported RST based documentation. Markdown support is now
available. ROCm documentation plans to default to markdown for new projects.
Existing projects using RST are under no obligation to convert to markdown. New
projects that believe markdown is not suitable should contact the documentation
originally used for Python. It is now widely used in the Open Source community.
Originally, Sphinx supported reStructuredText (RST) based documentation, but
Markdown support is now available.
ROCm documentation plans to default to Markdown for new projects.
Existing projects using RST are under no obligation to convert to Markdown. New
projects that believe Markdown is not suitable should contact the documentation
team prior to selecting RST.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds
and hosts the HTML documentation generated using Sphinx to our end users.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts
information from inline code.
ROCm projects typically use Doxygen for public API documentation unless the
upstream project uses a different tool.
### Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen
content.
### MyST
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
It is integrated via [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A cheat sheet that showcases how to use the MyST syntax is available over at [the Jupyter
reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
### Sphinx Theme
ROCm is using the
[Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/). This
theme is used by Jupyter books. ROCm documentation applies some customization
include a header and footer on top of the Sphinx Book Theme. A future custom
ROCm theme will be part of our documentation goals.
### Sphinx Design
Sphinx Design is an extension for sphinx based websites that add design
functionality. Please see the documentation
[here](https://sphinx-design.readthedocs.io/en/latest/index.html). ROCm
documentation uses sphinx design for grids, cards, and synchronized tabs.
Other features may be used in the future.
It is integrated into ROCm documentation by the Sphinx extension [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A cheat sheet that showcases how to use the MyST syntax is available over at
the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
### Sphinx External TOC
ROCm uses the
[sphinx-external-toc](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
for our navigation. This tool allows a YAML file based left navigation menu. This
tool was selected due to its flexibility that allows scripts to operate on the
[Sphinx External Table of Contents (TOC)](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file that specifies the table of contents.
It was selected due to its flexibility that allows scripts to operate on the
YAML file. Please transition to this file for the project's navigation. You can
see the `_toc.yml.in` file in this repository in the docs/sphinx folder for an
see the `_toc.yml.in` file in this repository in the `docs/sphinx` folder for an
example.
### Breathe
### Sphinx Book Theme
Sphinx uses [Breathe](https://www.breathe-doc.org/) to integrate Doxygen
content.
[Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme
that defines the base appearance for ROCm documentation.
ROCm documentation applies some customization,
such as a custom header and footer on top of the Sphinx Book Theme.
## `rocm-docs-core` pip package
### Sphinx Design
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD
maintained project that applies customization for our documentation. This
project is the tool most ROCm repositories will use as part of the documentation
build.
[Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design
functionality.
ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.

125
docs/about/licensing.md Normal file
View File

@@ -0,0 +1,125 @@
# ROCm licensing terms
ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately.
The following table is a list of ROCm components with links to their respective license
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, license name, and link to the license terms.
<!-- spellcheck-disable -->
| Component | License |
|:------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------:|
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCmSoftwarePlatform/hipTensor) | [MIT](https://github.com/ROCmSoftwarePlatform/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.
Proprietary products are only available on https://repo.radeon.com. Currently, only
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
Proprietary components are organized in a proprietary subdirectory in the package
repositories to distinguish from open sourced packages.
The additional terms and conditions below apply to your use of ROCm technical
documentation.
©2023 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.
## Package licensing
```{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary and is
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
AOCC CPU Optimizations.
```
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -5,9 +5,27 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import shutil
import jinja2
import os
from rocm_docs import ROCmDocs
# Environement to process Jinja templates.
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
# Jinja templates to render out.
templates = [
"./deploy/linux/quick_start.md.jinja",
"./deploy/linux/installer/install.md.jinja",
"./deploy/linux/os-native/install.md.jinja"
]
# Render templates and output files without the last extension.
# For example: 'install.md.jinja' becomes 'install.md'.
for template in templates:
rendered = jinja_env.get_template(template).render()
with open(os.path.splitext(template)[0], 'w') as file:
file.write(rendered)
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
shutil.copy2('../RELEASE.md','./release.md')
@@ -15,14 +33,20 @@ shutil.copy2('../RELEASE.md','./release.md')
shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
latex_engine = "xelatex"
latex_elements = {
"fontpkg": r"""
\usepackage{tgtermes}
\usepackage{tgheros}
\renewcommand\ttdefault{txtt}
"""
}
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = "5.6.0"
release = "5.6.0"
version = "5.7.1"
release = "5.7.1"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
@@ -86,7 +110,7 @@ article_pages = [
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs("ROCm Documentation Home")
docs_core = ROCmDocs("ROCm 5.7.1 Documentation Home")
docs_core.setup()
external_projects_current_project = "rocm"

165
docs/contribute/building.md Normal file
View File

@@ -0,0 +1,165 @@
# Building Documentation
While contributing, one may build the documentation locally on the command-line
or rely on Continuous Integration for previewing the resulting HTML pages in a
browser.
## Pull Request documentation builds
When opening a PR to the `develop` branch on GitHub, the page corresponding to
the PR (`https://github.com/RadeonOpenCompute/ROCm/pull/<pr_number>`) will have
a summary at the bottom. This requires the user be logged in to GitHub.
- There, click `Show all checks` and `Details` of the Read the Docs pipeline. It
will take you to a URL of the form
`https://readthedocs.com/projects/advanced-micro-devices-rocm/builds/<some_build_num>/`
- The list of commands shown are the exact ones used by CI to produce a render
of the documentation.
- There, click on the small blue link `View docs` (which is not the same as the
bigger button with the same text). It will take you to the built HTML site with
a URL of the form
`https://advanced-micro-devices-demo--<pr_number>.com.readthedocs.build/projects/alpha/en/<pr_number>/`.
## Build documentation from the Command Line
Python versions known to build documentation:
- 3.8
To build the docs locally using Python Virtual Environment (`venv`), execute the
following commands from the project root:
```sh
python3 -mvenv .venv
# Windows
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
# Linux
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
```
Then open up `_build/html/index.html` in your favorite browser.
## Build documentation using Visual Studio (VS) Code
One can put together a productive environment to author documentation and also
test it locally using VS Code with only a handful of extensions. Even though the
extension landscape of VS Code is ever changing, here is one example setup that
proved useful at the time of writing. In it, one can change/add content, build a
new version of the docs using a single VS Code Task (or hotkey), see all errors/
warnings emitted by Sphinx in the Problems pane and immediately see the
resulting website show up on a locally-served web server.
### Configuring VS Code
1. Install the following extensions:
- Python `(ms-python.python)`
- Live Server `(ritwickdey.LiveServer)`
2. Add the following entries in `.vscode/settings.json`
```json
{
"liveServer.settings.root": "/.vscode/build/html",
"liveServer.settings.wait": 1000,
"python.terminal.activateEnvInCurrentTerminal": true
}
```
The settings above are used for the following reasons:
- `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
alongside the `tasks.json` command.
- `liveServer.settings.wait`: Tells live server to wait with the update to give time for Sphinx to
regenerate site contents and not refresh before all is done. (Empirical value)
- `python.terminal.activateEnvInCurrentTerminal`: Automatic virtual environment activation is a nice touch,
should you want to build the site from the integrated terminal.
3. Add the following tasks in `.vscode/tasks.json`
```json
{
"version": "2.0.0",
"tasks": [
{
"label": "Build Docs",
"type": "process",
"windows": {
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
},
"command": "${workspaceFolder}/.venv/bin/python3",
"args": [
"-m",
"sphinx",
"-j",
"auto",
"-T",
"-b",
"html",
"-d",
"${workspaceFolder}/.vscode/build/doctrees",
"-D",
"language=en",
"${workspaceFolder}/docs",
"${workspaceFolder}/.vscode/build/html"
],
"problemMatcher": [
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"line": 2,
"severity": 3,
"message": 4
},
},
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"severity": 2,
"message": 3
}
}
],
"group": {
"kind": "build",
"isDefault": true
}
},
],
}
```
> (Implementation detail: two problem matchers were needed to be defined,
> because VS Code doesn't tolerate some problem information being potentially
> absent. While a single regex could match all types of errors, if a capture
> group remains empty (the line number doesn't show up in all warning/error
> messages) but the `pattern` references said empty capture group, VS Code
> discards the message completely.)
4. Configure Python virtual environment (`venv`)
- From the Command Palette, run `Python: Create Environment`
- Select `venv` environment and the `docs/sphinx/requirements.txt` file.
_(Simply pressing enter while hovering over the file from the drop down is
insufficient, one has to select the radio button with the 'Space' key if
using the keyboard.)_
5. Build the docs
- Launch the default build Task using either:
- a hotkey _(default is `Ctrl+Shift+B`)_ or
- by issuing the `Tasks: Run Build Task` from the Command Palette.
6. Open the live preview
- Navigate to the output of the site within VS Code, right-click on
`.vscode/build/html/index.html` and select `Open with Live Server`. The
contents should update on every rebuild without having to refresh the
browser.

View File

@@ -0,0 +1,27 @@
# How to provide feedback for ROCm documentation
There are four standard ways to provide feedback for this repository.
## Pull Request
All contributions to ROCm documentation should arrive via the
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
targeting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us.
## GitHub Discussions
To ask questions or view answers to frequently asked questions, refer to
[GitHub Discussions](https://github.com/RadeonOpenCompute/ROCm/discussions).
On GitHub Discussions, in addition to asking and answering questions,
members can share updates, have open-ended conversations,
and follow along on via public announcements.
## GitHub Issue
Issues on existing or absent docs can be filed as
[GitHub Issues](https://github.com/RadeonOpenCompute/ROCm/issues).
## Email
Send other feedback or questions to [rocm-feedback@amd.com](rocm-feedback@amd.com)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 323 KiB

View File

@@ -27,7 +27,7 @@ option, i.e. to allow access to all GPUs expose `/dev/kfd` and all
`/dev/dri/renderD` devices:
```shell
docker run --device /dev/kfd --device /dev/renderD128 --device /dev/renderD129 ...
docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD129 ...
```
More conveniently, instead of listing all devices, the entire `/dev/dri` folder

View File

@@ -3,6 +3,11 @@
Users installing ROCm must choose between various installation options. A new
user should follow the [Quick Start guide](./quick_start).
```{note}
See {doc}`Radeon Software for Linux installation instructions <radeon:docs/install/install-radeon>`
for those using select RDNA™ 3 GPU with graphical applications and ROCm.
```
## Package Manager versus AMDGPU Installer?
ROCm supports two methods for installation:

View File

@@ -1,114 +1,64 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Installation with install script
Prior to beginning, please ensure you have the [prerequisites](../prerequisites)
installed.
```{warning}
ROCm currently doesn't support integrated graphics. Should your system have an
AMD IGP installed, disable it in the BIOS prior to using ROCm. If the driver can
enumerate the IGP, the ROCm runtime may crash the system, even if told to omit
it via {ref}`hip_visible_devices`.
```
## Download the Installer Script
To download and install the `amdgpu-install` script on the system, use the
following commands based on your distribution.
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{% call(family) linux.for_family_in(linux.supported_family) %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- if os.tag == "ubuntu" %}
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/focal/amdgpu-install_5.6.50600-1_all.deb
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
wget https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/ubuntu/{{ version.release }}/amdgpu-install_{{ family.amdgpu_install_version }}_all.deb
sudo apt install ./amdgpu-install_{{ family.amdgpu_install_version }}_all.deb
```
{%- endcall -%}
{%- elif os.tag == "rhel" %}
{%- call(version) linux.for_version_in(os) %}
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/jammy/amdgpu-install_5.6.50600-1_all.deb
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
sudo yum install https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/rhel/{{ version.number }}/amdgpu-install-{{ family.amdgpu_install_version }}.{{ version.release | trim("rh") }}.noarch.rpm
```
{%- endcall -%}
{%- elif os.tag == "sle" %}
{%- call(version) linux.for_version_in(os) %}
:::
::::
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.6/amdgpu-install-5.6.50600-1.el8.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/sle/{{ version.number }}/amdgpu-install-{{ family.amdgpu_install_version }}.noarch.rpm
```
{%- endcall -%}
{%- endif %}
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.7/amdgpu-install-5.6.50600-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-install-5.5.50501-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/9.1/amdgpu-install-5.6.50600-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-install-5.5.50501-1.el8.noarch.rpm
```
:::
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.4/amdgpu-install-5.6.50600-1.noarch.rpm
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.5/amdgpu-install-5.6.50600-1.noarch.rpm
```
:::
::::
:::::
::::::
{%- endcall -%}
{%- endcall %}
## Use cases
@@ -171,6 +121,18 @@ To install use cases specific to your requirements, use the installer
sudo amdgpu-install --usecase=hiplibsdk,rocm
```
- For graphical workloads using the open-source driver add `graphics`. For example:
```shell
sudo amdgpu-install --usecase=graphics,rocm
```
- For workstation workloads using the proprietary driver add `workstation`. For example:
```shell
sudo amdgpu-install --usecase=workstation,rocm
```
## Single-version ROCm Installation
By default (without the `--rocmrelease` option)
@@ -181,9 +143,9 @@ the installer script will install packages in the single-version layout.
For the multi-version ROCm installation you must use the installer script from
the latest release of ROCm that you wish to install.
**Example:** If you want to install ROCm releases 5.3.3 and 5.4.3
**Example:** If you want to install ROCm releases 5.5.3, 5.6.1 and {{ linux.supported_family[0].rocm_version }}
simultaneously, you are required to download the installer from the latest ROCm
release v5.4.3.
release {{ linux.supported_family[0].rocm_version }}.
### Add Required Repositories
@@ -193,50 +155,37 @@ automatically adds the required repositories for the latest release.
Run the following commands based on your distribution to add the repositories:
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{% call(family) linux.for_family_in(linux.supported_family) %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- if os.tag == "ubuntu" %}
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
for ver in 5.3.3 5.4.3; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver {{ version.release }} main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
{%- endcall -%}
{%- elif os.tag == "rhel" %}
{%- call(version) linux.for_version_in(os) %}
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
for ver in 5.3.3 5.4.3; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
:::
::::
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8
:sync: RHEL-8
```shell
for ver in 5.3.3 5.4.3; do
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel8/$ver/main
baseurl=https://repo.radeon.com/rocm/{{ version.release }}/$ver/main
enabled=1
priority=50
gpgcheck=1
@@ -245,34 +194,10 @@ EOF
done
sudo yum clean all
```
:::
:::{tab-item} RHEL 9
:sync: RHEL-9
{%- endcall -%}
{%- elif os.tag == "sle" %}
```shell
for ver in 5.3.3 5.4.3; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel9/$ver/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
done
sudo yum clean all
```
:::
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```shell
for ver in 5.3.3 5.4.3; do
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
@@ -283,27 +208,29 @@ EOF
done
sudo zypper ref
```
{%- endif %}
:::::
::::::
{%- endcall -%}
{%- endcall %}
### Install packages
Use the installer script as given below:
```none
```shell
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-1>
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-2>
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
```
Following are examples of ROCm multi-version installation. The kernel-mode
driver, associated with the ROCm release v5.4.3, will be installed as its latest
driver, associated with the ROCm release {{ linux.supported_family[0].rocm_version }}, will be installed as its latest
release in the list.
```none
sudo amdgpu-install --usecase=rocm --rocmrelease=5.3.3
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3
```shell
sudo amdgpu-install --usecase=rocm --rocmrelease={{ linux.supported_family[0].rocm_version }}
sudo amdgpu-install --usecase=rocm --rocmrelease=5.6.1
sudo amdgpu-install --usecase=rocm --rocmrelease=5.5.3
```
## Additional options

View File

@@ -0,0 +1,116 @@
{%- set supported_family = ([
{
"tag": "instinct",
"name": "Select OS",
"amdgpu_version": "5.7.1",
"amdgpu_install_version": "5.7.50701-1",
"rocm_version": "5.7.1",
"rocm_install_version": "5.7.50701-1",
}
]) -%}
{%- set supported_os = ([
{
"tag": "ubuntu",
"name": "Ubuntu",
"shortname" : "Ubuntu",
"version": [
{
"number": "22.04",
"release": "jammy"
},
{
"number": "20.04",
"release": "focal"
}
]
},
{
"tag": "rhel",
"name": "Red Hat Enterprise Linux",
"shortname" : "RHEL",
"version": [
{
"number": "9.2",
"release": "rhel9"
},
{
"number": "9.1",
"release": "rhel9"
},
{
"number": "8.8",
"release": "rhel8"
},
{
"number": "8.7",
"release": "rhel8"
},
]
},
{
"tag": "sle",
"name": "SUSE Linux Enterprise Server",
"shortname" : "SLES",
"version": [
{
"number": "15.5"
},
{
"number": "15.4"
},
]
}
]) -%}
{%- macro for_family_in(supported_family) %}
::::::::{tab-set}
{%- for family in supported_family %}
:::::::{tab-item} {{ family.name }}
:sync: {{ family.tag }}
{{ caller(family) }}
:::::::
{%- endfor %}
::::::::
{%- endmacro -%}
{%- macro for_os_in(supported_os) %}
::::::{tab-set}
{%- for os in supported_os %}
:::::{tab-item} {{ os.name }}
:sync: {{ os.tag }}
{{ caller(os) }}
:::::
{%- endfor %}
::::::
{%- endmacro -%}
{%- macro for_version_in(os) %}
::::{tab-set}
{%- for version in os.version %}
:::{tab-item} {{ os.shortname }} {{ version.number }}
:sync: {{ os.tag }}-{{ version.number }}
{{ caller(version) }}
:::
{%- endfor %}
::::
{%- endmacro -%}
{%- macro install(os, argument) %}
```shell
{%- if os.tag == "ubuntu" %}
sudo apt install {{ argument }}
{%- elif os.tag == "rhel" %}
sudo yum install {{ argument }}
{%- elif os.tag == "sle" %}
sudo zypper install {{ argument }}
{%- endif %}
```
{%- endmacro -%}
{%- macro header_anchor(family, os) -%}
({{ caller() | lower | replace('#', '') | trim | replace(' ', '-')}}-{{ family.tag }}-{{ os.tag }})= {{ caller() }}
{%- endmacro -%}

View File

@@ -1,5 +1,14 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Installation (Linux)
```{warning}
ROCm currently doesn't support integrated graphics. Should your system have an
AMD IGP installed, disable it in the BIOS prior to using ROCm. If the driver can
enumerate the IGP, the ROCm runtime may crash the system, even if told to omit
it via {ref}`hip_visible_devices`.
```
## Understanding the Release-specific AMDGPU and ROCm Repositories on Linux Distributions
The release-specific repositories consist of packages from a specific release of
@@ -12,10 +21,9 @@ installed version by using the multi-version ROCm packages.
## Step by Step Instructions
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
::::{rubric} 1. Download and convert the package signing key
::::
@@ -46,31 +54,22 @@ section.
```
To add the AMDGPU repository, follow these steps:
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver={{ linux.supported_family[0].amdgpu_version }}
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu {{ version.release }} main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# amdgpu repository for jammy
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -85,38 +84,23 @@ sudo reboot
To add the ROCm repository, use the following steps:
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# ROCm repositories for focal
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
# ROCm repositories for {{ version.release }}
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver {{ version.release }} main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# ROCm repositories for jammy
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
:::
::::
{%- endcall %}
::::{rubric} 4. Install packages
::::
@@ -136,13 +120,9 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo apt install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo apt install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
```
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
{%- elif os.tag == "rhel" %}
::::{rubric} 1. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
::::
@@ -150,17 +130,20 @@ For a comprehensive list of meta-packages, refer to
If you have a version of the kernel-mode driver installed, you may skip this
section.
```
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver={{ linux.supported_family[0].amdgpu_version }}
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/{{ version.number }}/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -168,86 +151,7 @@ gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
:sync: RHEL-9
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -266,7 +170,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
:sync: RHEL-8
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -277,6 +181,7 @@ gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
done
sudo yum clean all
```
@@ -285,7 +190,7 @@ sudo yum clean all
:sync: RHEL-9
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -320,13 +225,9 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo yum install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo yum install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1
```
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
{%- elif os.tag == "sle" %}
::::{rubric} 1. Add the AMDGPU Repository and Install the Kernel-mode Driver
::::
@@ -334,41 +235,27 @@ For a comprehensive list of meta-packages, refer to
If you have a version of the kernel-mode driver installed, you may skip this
section.
```
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver={{ linux.supported_family[0].amdgpu_version }}
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -384,7 +271,7 @@ sudo reboot
To add the ROCm repository, use the following steps:
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -416,14 +303,12 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1
```
:::::
::::::
{%- endif %}
{%- endcall %}
(post-install-actions-linux)=
## Post-install Actions and Verification Process
The post-install actions listed here are optional and depend on your use case,
@@ -453,7 +338,8 @@ but are generally useful. Verification of the install is advised.
2. Add binary paths to the `PATH` environment variable.
```shell
export PATH=$PATH:/opt/rocm/bin:/opt/rocm-5.6/opencl/bin
export PATH=$PATH:/opt/rocm-{{ linux.supported_family[0].rocm_version }}/bin:/opt/rocm-{{ linux.supported_family[0].rocm_version }}/opencl/bin
```
```{attention}
@@ -496,31 +382,18 @@ by both commands, the installation is considered successful:
To ensure the packages are installed successfully, use the following commands:
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
```shell
sudo apt list --installed
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
{%- elif os.tag == "rhel" %}
```shell
sudo yum list installed
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
{%- elif os.tag == "sle" %}
```shell
sudo zypper search --installed-only
```
:::
::::
{%- endif %}
{%- endcall %}

View File

@@ -12,8 +12,6 @@ following AMD ROCm programming models:
A meta-package is a grouping of related packages and dependencies used to
support a specific use case.
**Example:** Running HIP applications
All meta-packages exist in both versioned and non-versioned forms.
- Non-versioned packages For a single-version installation of the ROCm stack

View File

@@ -25,8 +25,12 @@ repository to the new release.
:sync: ubuntu-20.04
```shell
# version
version=5.7
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu focal main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -36,8 +40,12 @@ sudo apt update
:sync: ubuntu-22.04
```shell
# version
version=5.7
# amdgpu repository for jammy
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu jammy main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -49,51 +57,19 @@ sudo apt update
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -108,10 +84,14 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.1/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -121,15 +101,41 @@ sudo yum clean all
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -145,14 +151,18 @@ sudo yum clean all
:sync: SLES
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
# version
version=5.7
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -161,14 +171,18 @@ sudo zypper ref
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
# version
version=5.7
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -230,7 +244,11 @@ repository to the new release.
:sync: ubuntu-20.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 focal main" \
# version
version=5.7
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version focal main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -242,7 +260,11 @@ sudo apt update
:sync: ubuntu-22.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 jammy main" \
# version
version=5.7
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version jammy main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -260,10 +282,14 @@ sudo apt update
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
baseurl=https://repo.radeon.com/rocm/rhel8/5.6/main
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel8/$version/main
enabled=1
priority=50
gpgcheck=1
@@ -277,10 +303,14 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
baseurl=https://repo.radeon.com/rocm/rhel9/5.6/main
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel9/$version/main
enabled=1
priority=50
gpgcheck=1
@@ -296,11 +326,14 @@ sudo yum clean all
:sync: SLES
```shell
# version
version=5.7
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
[ROCm-$ver]
name=ROCm$ver
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/5.6/main
baseurl=https://repo.radeon.com/rocm/zyp/$version/main
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

View File

@@ -116,12 +116,16 @@ sudo crb enable
Add the perl languages repository.
```{note}
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
```
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
```
:::

View File

@@ -1,369 +0,0 @@
# Quick Start (Linux)
## Add Repositories
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
::::{rubric} 1. Download and convert the package signing key
::::
```shell
# Make the directory if it doesn't exist yet.
# This location is recommended by the distribution maintainers.
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
# Download the key, convert the signing-key to a full
# keyring required by apt and store in the keyring directory
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
```
::::{rubric} 2. Add the repositories
::::
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```shell
# Kernel driver repository for focal
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu focal main
EOF
# ROCm repository for focal
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian focal main
EOF
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# Kernel driver repository for jammy
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu jammy main
EOF
# ROCm repository for jammy
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
```
:::
::::
::::{rubric} 3. Update the list of packages
::::
```shell
sudo apt update
```
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{rubric} 1. Add the repositories
::::
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
```shell
# Add the amdgpu module repository for RHEL 8.6
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.6/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
```shell
# Add the amdgpu module repository for RHEL 8.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.7/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
```shell
# Add the amdgpu module repository for RHEL 8.8
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.8/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
```shell
# Add the amdgpu module repository for RHEL 9.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.1/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 9
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
```shell
# Add the amdgpu module repository for RHEL 9.2
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.2/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 9
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
::::
::::{rubric} 2. Clean cached files from enabled repositories
::::
```shell
sudo yum clean all
```
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
::::{rubric} 1. Add the repositories
::::
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
# Add the amdgpu module repository for SLES 15.4
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
# Add the amdgpu module repository for SLES 15.5
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
::::
::::{rubric} 2. Update the new repository
::::
```shell
sudo zypper ref
```
:::::
::::::
## Install Drivers
Install the `amdgpu-dkms` kernel module, aka driver, on your system.
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
```shell
sudo apt install amdgpu-dkms
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
```shell
sudo yum install amdgpu-dkms
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```shell
sudo zypper install amdgpu-dkms
```
:::
::::
## Install ROCm Runtimes
Install the `rocm-hip-libraries` meta-package. This contains dependencies for most
common ROCm applications.
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
```console shell
sudo apt install rocm-hip-libraries
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
```console shell
sudo yum install rocm-hip-libraries
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```console shell
sudo zypper install rocm-hip-libraries
```
:::
::::
## Reboot the system
Loading the new driver requires a reboot of the system.
```shell
sudo reboot
```

View File

@@ -0,0 +1,162 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Quick Start (Linux)
:::{note}
See {doc}`Radeon Software for Linux installation instructions <radeon:docs/install/install-radeon>`
for those using select RDNA™ 3 GPU with graphical applications and ROCm.
:::
## Add Repositories
{% set family = linux.supported_family[0] %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
::::{rubric} 1. Download and convert the package signing key
::::
```shell
# Make the directory if it doesn't exist yet.
# This location is recommended by the distribution maintainers.
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
# Download the key, convert the signing-key to a full
# keyring required by apt and store in the keyring directory
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
```
::::{rubric} 2. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Kernel driver repository for {{ version.release }}
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/ubuntu {{ version.release }} main
EOF
# ROCm repository for {{ version.release }}
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/{{ family.amdgpu_version }} {{ version.release }} main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
```
{%- endcall %}
::::{rubric} 3. Update the list of packages
::::
```shell
sudo apt update
```
{%- elif os.tag == "rhel" %}
::::{rubric} 1. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Add the amdgpu module repository for RHEL {{ version.number }}
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/rhel/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for {{ version.release | upper }}
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/{{ version.release }}/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
{%- endcall %}
::::{rubric} 2. Clean cached files from enabled repositories
::::
```shell
sudo yum clean all
```
{%- elif os.tag == "sle" %}
::::{rubric} 1. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Add the amdgpu module repository for SLES {{ version.number }}
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/sle/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
{%- endcall %}
::::{rubric} 2. Update the new repository
::::
```shell
sudo zypper ref
```
{%- endif %}
{%- endcall -%}
{%- call(os) linux.for_os_in(linux.supported_os) %}
## Install drivers
Install the `amdgpu-dkms` kernel module, aka driver, on your system.
{{ linux.install(os, "amdgpu-dkms")}}
{%- endcall %}
## Install ROCm runtimes
Install the `rocm-hip-libraries` meta-package. This contains dependencies for most
common ROCm applications.
{%- call(os) linux.for_os_in(linux.supported_os) %}
{{ linux.install(os, "rocm-hip-libraries")}}
{%- endcall %}
## Reboot the system
Loading the new driver requires a reboot of the system.
```shell
sudo reboot
```

View File

@@ -1,4 +1,4 @@
# Deploy ROCm on Windows
# Install ROCm (HIP SDK) on Windows
Start with {doc}`/deploy/windows/quick_start` or follow the detailed
instructions below.
@@ -39,6 +39,27 @@ Use the command line front-end of the installer.
::::
## Post Installation
::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} ROCm-Examples
:link: https://github.com/amd/rocm-examples
:link-type: url
Learn how to use ROCm with descriptive examples for novice to intermediate users.
:::
:::{grid-item-card} Windows App Deployment Guidelines
:link: ../../understand/windows-app-deployment-guidelines
:link-type: doc
Discusses strategies on how to bundle HIP libraries with an end user application.
:::
::::
## See Also
- {doc}`/release/gpu_os_support`

View File

@@ -6,16 +6,16 @@ system meets all the requirements to proceed with the installation.
## Confirm the System Is Supported
The ROCm installation is supported only on specific host architectures, Windows
SKUs and update versions.
Editions and update versions.
### Check the Windows SKU and Update Version on Your System
### Check the Windows Editions and Update Version on Your System
This section discusses obtaining information about the host architecture,
Windows SKU and update version.
Windows Edition and update version.
#### Command Line Check
Verify the Windows SKU using the following steps:
Verify the Windows Edition using the following steps:
1. To obtain the Linux distribution information, type the following command on
your system from a PowerShell Command Line Interface (CLI):

View File

@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
- Number of arguments
@@ -187,7 +187,7 @@ Follow these steps:
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
```
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
```{figure} ../../data/understand/deep_learning/image.018.png
:name: image018

View File

@@ -3,6 +3,14 @@
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} ROCm using Radeon
:link: {doc}`ROCm using Radeon <radeon:index>`
:link-type: url
ROCm and PyTorch installation processes to pair with the Radeon RX 7900 XTX GPU or the Radeon PRO W7900 GPU,
and get started on a fully-functional environment for AI and ML development.
:::
:::{grid-item-card} Tuning Guides
:link: tuning_guides/index
:link-type: doc

View File

@@ -66,11 +66,8 @@ cd ucx
./autogen.sh
mkdir build
cd build
../contrib/configure-release -prefix=$UCX_DIR \
--with-rocm=/opt/rocm \
--without-cuda -enable-optimizations -disable-logging \
--disable-debug -disable-assertions \
--disable-params-check -without-java
../configure -prefix=$UCX_DIR \
--with-rocm=/opt/rocm
make -j $(nproc)
make -j $(nproc) install
```
@@ -93,9 +90,7 @@ cd ompi
mkdir build
cd build
../configure --prefix=$OMPI_DIR --with-ucx=$UCX_DIR \
--with-rocm=/opt/rocm \
--enable-mca-no-build=btl-uct --enable-mpi1-compatibility \
CC=clang CXX=clang++ FC=flang
--with-rocm=/opt/rocm
make -j $(nproc)
make -j $(nproc) install
```
@@ -165,7 +160,12 @@ Inter-GPU bandwidth with various payload sizes.
Collective Operations on GPU buffers are best handled through the
Unified Collective Communication Library (UCC) component in Open MPI.
For this, the UCC library has to be configured and compiled with ROCm
support. An example for configuring UCC and Open MPI with ROCm support
support.
Please note the compatibility [table](../release/3rd_party_support_matrix.md#communication-libraries)
for UCC versions with the various ROCm versions.
An example for configuring UCC and Open MPI with ROCm support
is shown below:
```shell

View File

@@ -59,16 +59,7 @@ Follow these steps:
PyTorch supports the ROCm platform by providing tested wheels packages. To
access this feature, refer to
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
and choose the "ROCm" compute platform. {numref}`Installation-Matrix-from-Pytorch` is a matrix from <https://pytorch.org/> that illustrates the installation compatibility between ROCm and the PyTorch build.
```{figure} ../../data/how_to/magma_install/image.006.png
:name: Installation-Matrix-from-Pytorch
---
align: center
---
Installation Matrix from Pytorch
```
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). For the correct wheels command, you must select 'Linux', 'Python', 'pip', and 'ROCm' in the matrix.
To install PyTorch using the wheels package, follow these installation steps:
@@ -299,7 +290,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
### Test the PyTorch Installation
You can use PyTorch unit tests to validate a PyTorch installation. If using a
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
wheels package, these tests are already run on those configurations.
Alternatively, you can manually run the unit tests to validate the PyTorch
installation fully.

View File

@@ -64,5 +64,4 @@ Debug messages when developing/debugging base ROCm driver. You could enable the
## PCIe-Debug
Refer to ROCm PCIe Debug, <a href="https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug" target="_blank">https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug</a>.
For information on how to debug and profile HIP applications, see {doc}`hip:how_to_guides/debugging`

View File

@@ -9,6 +9,19 @@ training and inference in neural networks. It is one of the most popular and
in-demand frameworks and is very active in open source contribution and
development.
:::{warning}
ROCm 5.6 and 5.7 deviates from the standard practice of supporting the last three
TensorFlow versions. This is due to incompatibilities between earlier TensorFlow
versions and changes introduced in the ROCm 5.6 compiler. Refer to the following
version support matrix:
| ROCm | TensorFlow |
|:-----:|:----------:|
| 5.6.x | 2.12 |
| 5.7.0 | 2.12, 2.13 |
| Post-5.7.0 | Last three versions at ROCm release. |
:::
### Installing TensorFlow
The following sections contain options for installing TensorFlow.

View File

@@ -5,11 +5,10 @@
::::{grid-item}
:::{dropdown} [What is ROCm?](rocm)
ROCm is an open-source stack for GPU computation. ROCm is primarily
Open-Source Software (OSS) that allows developers the freedom to customize and
tailor their GPU software for their own needs while collaborating with a
community of other developers, and helping each other find solutions in an
agile, flexible, rapid and secure manner. [more...](rocm)
ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for
graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development
tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
[more...](rocm)
::::
@@ -18,6 +17,7 @@ agile, flexible, rapid and secure manner. [more...](rocm)
- {doc}`/deploy/linux/index`
- {doc}`/deploy/docker`
- {doc}`Deploy ROCm using Radeon <radeon:index>`
:::
::::

6
docs/license.md Normal file
View File

@@ -0,0 +1,6 @@
# License
> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that contains documentation primarily. For other licensing information, see the [Licensing Terms page](./release/licensing).
```{include} ../LICENSE
```

View File

@@ -29,6 +29,7 @@ ROCm template libraries for C++ primitives and algorithms are as follows:
- {doc}`rocPRIM <rocprim:index>`
- {doc}`rocThrust <rocthrust:index>`
- {doc}`hipCUB <hipcub:index>`
- {doc}`hipTensor <hiptensor:index>`
:::

View File

@@ -1 +0,0 @@
# Docker

View File

@@ -40,4 +40,14 @@ interface. It's back-end is rocPRIM.
:::
:::{grid-item-card} {doc}`hipTensor <hiptensor:index>`
hipTensor is AMD's C++ library for accelerating tensor primitives
based on the composable kernel library,
through general purpose kernel languages, like HIP C++.
- {doc}`Documentation <hiptensor:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
:::
:::::

View File

@@ -97,4 +97,13 @@ supporting both `rocSPARSE` and `cuSPARSE` as backends.
:::
:::{grid-item-card} {doc}`hipSPARSELt <hipsparselt:index>`
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
supporting both `rocSPARSELt` and `cuSPARSELt` as backends.
- {doc}`Documentation <hipsparselt:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
:::
:::::

View File

@@ -1,6 +1,6 @@
# Math Libraries
AMD provides various math domain and support libraries as part of the ROCm.
AMD provides various math domain and support libraries as part of ROCm.
## rocLIB vs. hipLIB
@@ -26,6 +26,7 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
- {doc}`hipSOLVER <hipsolver:index>`
- {doc}`rocSPARSE <rocsparse:index>`
- {doc}`hipSPARSE <hipsparse:index>`
- {doc}`hipSPARSELt <hipsparselt:index>`
:::

View File

@@ -11,13 +11,18 @@ OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
with OpenMP applications. The GPUs supported are the same as those supported by
this ROCm release. See the list of supported GPUs in {doc}`/release/gpu_os_support`.
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
{numref}`openmp-toolchain` illustrates the internal steps taken to translate a users application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
![OpenMP Toolchain](../../data/reference/openmp/openmp_toolchain.svg "OpenMP toolchain" =800x600)
### Installation
The OpenMP toolchain is automatically installed as part of the standard ROCm
installation and is available under `/opt/rocm-{version}/llvm`. The
sub-directories are:
bin: Compilers (`flang` and `clang`) and other binaries.
- bin: Compilers (`flang` and `clang`) and other binaries.
- examples: The usage section below shows how to compile and run these programs.
@@ -107,8 +112,7 @@ code compiled with AOMP:
options --list-basic and --list-derived. `rocprof` accepts either a text or
an XML file as an input.
For more details on `rocprof`, refer to the ROCm Profiling Tools document on
{doc}`rocprofiler:rocprof`.
For more details on `rocprof`, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Using Tracing Options
@@ -134,20 +138,21 @@ Google Chrome at chrome://tracing/ or [Perfetto](https://perfetto.dev/).
Navigate to Chrome or Perfetto and load the JSON file to see the timeline of the
HSA calls.
For more details on tracing, refer to the ROCm Profiling Tools document on
{doc}`rocprofiler:rocprof`.
For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Environment Variables
:::{table}
:widths: auto
| Environment Variable | Description |
| Environment Variable | Purpose |
| --------------------------- | ---------------------------- |
| `OMP_NUM_TEAMS` | The implementation chooses the number of teams for kernel launch. The user can change this number for performance tuning using this environment variable, subject to implementation limits. |
| `LIBOMPTARGET_KERNEL_TRACE` | This environment variable is used to print useful statistics for device operations. Setting it to 1 and running the program emits the name of every kernel launched, the number of teams and threads used, and the corresponding register usage. Setting it to 2 additionally emits timing information for kernel launches and data transfer operations between the host and the device. |
| `LIBOMPTARGET_INFO` | This environment variable is used to print informational messages from the device runtime as the program executes. Users can request fine-grain information by setting it to the value of 1 or higher and can set the value of -1 for complete information. |
| `LIBOMPTARGET_DEBUG` | If a debug version of the device library is present, setting this environment variable to 1 and using that library emits further detailed debugging information about data transfer operations and kernel launch. |
| `GPU_MAX_HW_QUEUES` | This environment variable is used to set the number of HSA queues in the OpenMP runtime. |
| `OMP_NUM_TEAMS` | To set the number of teams for kernel launch, which is otherwise chosen by the implementation by default. You can set this number (subject to implementation limits) for performance tuning. |
| `LIBOMPTARGET_KERNEL_TRACE` | To print useful statistics for device operations. Setting it to 1 and running the program emits the name of every kernel launched, the number of teams and threads used, and the corresponding register usage. Setting it to 2 additionally emits timing information for kernel launches and data transfer operations between the host and the device. |
| `LIBOMPTARGET_INFO` | To print informational messages from the device runtime as the program executes. Setting it to a value of 1 or higher, prints fine-grain information and setting it to -1 prints complete information. |
| `LIBOMPTARGET_DEBUG` | To get detailed debugging information about data transfer operations and kernel launch when using a debug version of the device library. Set this environment variable to 1 to get the detailed information from the library. |
| `GPU_MAX_HW_QUEUES` | To set the number of HSA queues in the OpenMP runtime. The HSA queues are created on demand up to the maximum value as supplied here. The queue creation starts with a single initialized queue to avoid unnecessary allocation of resources. The provided value is capped if it exceeds the recommended, device-specific value. |
| `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` | To set the threshold size up to which data transfers are initiated asynchronously. The default threshold size is 1*1024*1024 bytes (1MB). |
| `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
:::
## OpenMP: Features
@@ -159,10 +164,17 @@ implemented in the past releases.
### Asynchronous Behavior in OpenMP Target Regions
- Multithreaded offloading on the same device
- Controlling Asynchronous Behavior
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
- Multithreaded Offloading on the Same Device
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
- Parallel memory copy invocations
- Parallel Memory Copy Invocations
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
### Unified Shared Memory
@@ -317,8 +329,10 @@ double a = 0.0;
a = a + 1.0;
```
NOTE `AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
:::{note}
`AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
`AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
:::
To disable the generation of fast floating-point atomic instructions at the file
level, build using the option `-msafe-fp-atomics` or use a hint clause on a

View File

@@ -1109,7 +1109,7 @@ The following table lists the other Clang options and their support status.
|-ftime-trace|Supported|Turns on time profiler. Generates JSON file based on output filename|
|-ftrap-function= \<value\>|Unsupported|Issues call to specified function rather than a trap instruction|
|-ftrapv-handler= \<function name\>|Unsupported|Specifies the function to be called on overflow|
|-ftrapv|Unsupported|Traps on integer overflow|
|-ftrapv|Supported|Traps on integer overflow|
|-ftrigraphs|Supported|Processes trigraph sequences|
|-ftrivial-auto-var-init-stop-after= \<value\>|Supported|Stops initializing trivial automatic stack variables after the specified number of instances|
|-ftrivial-auto-var-init= \<value\>|Supported|Initializes trivial automatic stack variables. Values: uninitialized (default) / pattern|

View File

@@ -9,17 +9,18 @@ work, but aren't tested.
## Deep Learning
ROCm releases support the most recent and two prior releases of PyTorch and
TensorFlow
TensorFlow.
| ROCm | [PyTorch](https://github.com/pytorch/pytorch/releases/) | [TensorFlow](https://github.com/tensorflow/tensorflow/releases/) | [MAGMA](https://icl.utk.edu/magma/index.html) |
|:------|:--------------------------:|:--------------------:|:-----:|
| 5.0.2 | 1.8, 1.9, 1.10 | 2.6, 2.7, 2.8 | |
| 5.1.3 | 1.9, 1.10, 1.11 | 2.7, 2.8, 2.9 | |
| 5.2.x | 1.10, 1.11, 1.12 | 2.8, 2.9, 2.9 | |
| 5.3.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10 | |
| 5.4.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10, 2.11 | 2.5.4 |
| 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11 | 2.5.4 |
| 5.6 | 1.11, 1.12.1, 1.13.1 | 2.12 | 2.5.4 |
| ROCm | [PyTorch](https://github.com/pytorch/pytorch/releases/) | [TensorFlow](https://github.com/tensorflow/tensorflow/releases/) |
|:------|:--------------------------:|:--------------------:|
| 5.0.2 | 1.8, 1.9, 1.10 | 2.6, 2.7, 2.8 |
| 5.1.3 | 1.9, 1.10, 1.11 | 2.7, 2.8, 2.9 |
| 5.2.x | 1.10, 1.11, 1.12 | 2.8, 2.9, 2.9 |
| 5.3.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10 |
| 5.4.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10, 2.11 |
| 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11, 2.13 |
| 5.6.x | 1.12.1, 1.13, 2.0 | 2.12, 2.13 |
| 5.7.x | 1.12.1, 1.13, 2.0 | 2.12, 2.13 |
## Communication libraries
@@ -32,6 +33,14 @@ UCX version | ROCm 5.4 and older | ROCm 5.5 and newer |
| -1.14.0 | COMPATIBLE | INCOMPATIBLE |
| 1.14.1+ | COMPATIBLE | COMPATIBLE |
The Unified Collective Communication Library [UCC](https://https://github.com/openucx/ucc)
also has support for ROCm devices.
UCC version | ROCm 5.5 and older | ROCm 5.6 and newer |
|:----------|:------------------:|:------------------:|
| -1.1.0 | COMPATIBLE | INCOMPATIBLE |
| 1.2.0+ | COMPATIBLE | COMPATIBLE |
## Algorithm libraries
ROCm releases provide algorithm libraries with interfaces compatible with
@@ -48,7 +57,8 @@ contemporary CUDA / NVIDIA HPC SDK alternatives.
| 5.3.x | 1.16 | 22.7 |
| 5.4.x | 1.16 | 22.9 |
| 5.5.x | 1.17 | 22.9 |
| 5.6 | 1.17.2 | 22.9 |
| 5.6.x | 1.17.2 | 22.9 |
| 5.7.x | 1.17.2 | 22.9 |
For the latest documentation of these libraries, refer to the
[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md).

View File

@@ -1,88 +0,0 @@
# Docker Image Support Matrix
The software support matrices for ROCm container releases is listed.
## ROCm 5.6
### PyTorch
#### `Ubuntu+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.15.1](https://github.com/pytorch/vision/tree/v0.15.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
#### `CentOS7+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.15.1](https://github.com/pytorch/vision/tree/v0.15.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
#### `1.13 +bfeb431`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.13.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.14.0](https://github.com/pytorch/vision/tree/v0.14.0)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
#### `1.12 +05d5d04`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.12.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.13.1](https://github.com/pytorch/vision/tree/v0.13.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
### TensorFlow
#### `tensorflow_develop-upstream-QA-rocm56 +c88a9f4`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* `tensorflow-rocm` 2.13.0
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
#### `r2.11-rocm-enhanced +5be4141`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.11.2](https://github.com/tensorflow/tensorboard/tree/2.11.2)
#### `r2.10-rocm-enhanced +72789a3`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.10.1](https://github.com/tensorflow/tensorboard/tree/2.10.1)

View File

@@ -0,0 +1,131 @@
******************************************************************
Docker image support matrix
******************************************************************
AMD validates and publishes `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_ and `TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_
containers on docker hub. The following tags, and associated inventories, are validated with ROCm 5.7.
.. tab-set::
.. tab-item:: PyTorch
.. tab-set::
.. tab-item:: Ubuntu 22.04
Tag: `rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1/images/sha256-21df283b1712f3d73884b9bc4733919374344ceacb694e8fbc2c50bdd3e767ee>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.10 <https://www.python.org/downloads/release/python-31013/>`_
* `Torch 2.0.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/2.0>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.15.0 <https://github.com/pytorch/vision/tree/release/0.15>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
.. tab-item:: Ubuntu 20.04
Tag: `rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4)>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1/images/sha256-e67db9373c045a7b6defd43cc3d067e7d49fd5d380f3f8582d2fb219c1756e1f>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.13.1 <https://github.com/pytorch/vision/tree/v0.13.1>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1/images/sha256-ed99d159026093d2aaf5c48c1e4b0911508773430377051372733f75c340a4c1>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.14.0 <https://github.com/pytorch/vision/tree/v0.14.0>`_
* `Tensorboard 2.12.0 <https://github.com/tensorflow/tensorboard/tree/2.12.0>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.0.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/2.0>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.15.2 <https://github.com/pytorch/vision/tree/release/0.15>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
.. tab-item:: CentOS 7
Tag: `rocm/pytorch:rocm5.7_centos7_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_centos7_py3.9_pytorch_staging/images/sha256-92240cdf0b4aa7afa76fc78be995caa19ee9c54b5c9f1683bdcac28cedb58d2b>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/yum/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
.. tab-item:: TensorFlow
.. tab-set::
.. tab-item:: Ubuntu 20.04
Tag: `rocm5.7-tf2.12-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.12-dev/images/sha256-e0ac4d49122702e5167175acaeb98a79b9500f585d5e74df18facf6b52ce3e59>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `tensorflow-rocm 2.12.1 <https://pypi.org/project/tensorflow-rocm/2.12.1.570/>`_
* `Tensorboard 2.12.3 <https://github.com/tensorflow/tensorboard/tree/2.12>`_
Tag: `rocm5.7-tf2.13-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.13-dev/images/sha256-6f995539eebc062aac2b53db40e2b545192d8b032d0deada8c24c6651a7ac332>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `tensorflow-rocm 2.13.0 <https://pypi.org/project/tensorflow-rocm/2.13.0.570/>`_
* `Tensorboard 2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_

View File

@@ -1,4 +1,4 @@
# GPU Support and OS Compatibility (Linux)
# GPU and OS Support (Linux)
(supported_distributions)=
@@ -12,19 +12,22 @@ AMD ROCm™ Platform supports the following Linux distributions.
| Distribution | Processor Architectures | Validated Kernel | Support |
| :----------- | :---------------------: | :--------------: | ------: |
| RHEL 9.2 | x86-64 | 5.14 (5.14.0-284.11.1.el9_2.x86_64) | ✅ |
| RHEL 9.1 | x86-64 | 5.14.0-284.11.1.el9_2.x86_64 | ✅ |
| RHEL 8.8 | x86-64 | 4.18.0-477.el8.x86_64 | ✅ |
| RHEL 8.7 | x86-64 | 4.18.0-425.10.1.el8_7.x86_64 | ✅ |
| SLES 15 SP5 | x86-64 | 5.14.21-150500.53-default | ✅ |
| SLES 15 SP4 | x86-64 | 5.14.21-150400.24.63-default | ✅ |
| Ubuntu 22.04.2 | x86-64 | 5.19.0-45-generic | ✅ |
| Ubuntu 20.04.5 | x86-64 | 5.15.0-75-generic | ✅ |
| CentOS 7.9 | x86-64 | 3.10 | ✅ |
| RHEL 7.9 | x86-64 | 3.10 | ✅ |
| RHEL 8.7 | x86-64 | 4.18 | ✅ |
| RHEL 8.8 | x86-64 | 4.18 | ✅ |
| RHEL 9.1 | x86-64 | 5.14 | ✅ |
| RHEL 9.2 | x86-64 | 5.14 | ✅ |
| SLES 15 SP4 | x86-64 | 5.14.21 | ✅ |
| SLES 15 SP5 | x86-64 | 5.14.21 | ✅ |
| Ubuntu 20.04.5 | x86-64 | 5.15 | ✅ |
| Ubuntu 20.04.6 | x86-64 | 5.15 | ✅ |
| Ubuntu 22.04.2 | x86-64 | 5.19 | ✅ |
| Ubuntu 22.04.3 | x86-64 | 6.2 | ✅ |
:::{versionadded} 5.6
:::{versionadded} 5.7.0
- RHEL 8.8 and 9.2 support is added.
- SLES 15 SP5 support is added
- Ubuntu 22.04.3 support was added.
:::
@@ -58,19 +61,17 @@ ROCm supports virtualization for select GPUs only as shown below.
| VMWare | ESXi 8 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
| VMWare | ESXi 7 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
(supported_gpus)=
## Linux Supported GPUs
## Supported GPUs
The table below shows supported GPUs for Instinct™, Radeon Pro™ and Radeon™
GPUs. Please click the tabs below to switch between GPU product lines. If a GPU
is not listed on this table, the GPU is not officially supported by AMD.
The following table shows the list of GPUs supported on Linux distributions:
:::::{tab-set}
::::{tab-set}
:::{tab-item} AMD Instinct™
::::{tab-item} AMD Instinct™
:sync: instinct
Use Driver Shipped with ROCm
| Product Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) |Support |
|:------------:|:------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Instinct™ MI250X | CDNA2 | gfx90a | ✅ |
@@ -80,34 +81,42 @@ Use Driver Shipped with ROCm
| AMD Instinct™ MI50 | GCN5.1 | gfx906 | ✅ |
| AMD Instinct™ MI25 | GCN5.0 | gfx900 | ❌ |
:::
::::
:::{tab-item} Radeon Pro™
::::{tab-item} Radeon Pro™
:sync: radeonpro
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
:::{note}
See {doc}`Radeon Software for Linux compability matrix <radeon:docs/compatibility>`
for those using select RDNA™ 3 GPU with graphical applications and ROCm.
:::
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ Pro W7900 | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ Pro W6800 | RDNA2 | gfx1030 | ✅ |
| AMD Radeon™ Pro V620 | RDNA2 | gfx1030 | ✅ |
| AMD Radeon™ Pro VII | GCN5.1 | gfx906 | ✅ |
::::
:::
:::{tab-item} Radeon™
::::{tab-item} Radeon™
:sync: radeonpro
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
:::{note}
See {doc}`Radeon Software for Linux compatibility <radeon:docs/compatibility>`
for those using select RDNA™ 3 GPU with graphical applications and ROCm.
:::
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:---------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
::::
:::::
### Support Status
- ✅: **Supported** - AMD enables these GPUs in our software distributions for

View File

@@ -5,64 +5,65 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |
| Component | License |
|:------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------:|
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCmSoftwarePlatform/hipTensor) | [MIT](https://github.com/ROCmSoftwarePlatform/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.

View File

@@ -14,5 +14,11 @@ the compatibility combinations that are currently supported.
| 5.3.0 | 5.1.3, 5.2.3 |
| 5.3.3 | 5.4.0, 5.5.0 |
| 5.4.0 | 5.2.3, 5.3.3 |
| 5.4.3 | 5.5.0, 5.6.0 |
| 5.4.4 | 5.5.0 |
| 5.5.0 | 5.3.3, 5.4.3 |
| 5.5.1 | 5.6.0, 5.7.0 |
| 5.6.0 | 5.4.3, 5.5.1 |
| 5.6.1 | 5.7.0 |
| 5.7.0 | 5.5.0, 5.6.1 |
| 5.7.1 | 5.5.0, 5.6.1 |

26
docs/release/versions.md Normal file
View File

@@ -0,0 +1,26 @@
# ROCm Release History
| Version | Release Date |
| ------- | ------------ |
| [5.7.1](https://rocm.docs.amd.com/en/docs-5.7.1/) | Oct 13, 2023 |
| [5.7.0](https://rocm.docs.amd.com/en/docs-5.7.0/) | Sep 15, 2023 |
| [5.6.1](https://rocm.docs.amd.com/en/docs-5.6.1/) | Aug 29, 2023 |
| [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/) | Jun 28, 2023 |
| [5.5.1](https://rocm.docs.amd.com/en/docs-5.5.1/) | May 24, 2023 |
| [5.5.0](https://rocm.docs.amd.com/en/docs-5.5.0/) | May 1, 2023 |
| [5.4.3](https://rocm.docs.amd.com/en/docs-5.4.3/) | Feb 7, 2023 |
| [5.4.2](https://rocm.docs.amd.com/en/docs-5.4.2/) | Jan 13, 2023 |
| [5.4.1](https://rocm.docs.amd.com/en/docs-5.4.1/) | Dec 15, 2022 |
| [5.4.0](https://rocm.docs.amd.com/en/docs-5.4.0/) | Nov 30, 2022 |
| [5.3.3](https://rocm.docs.amd.com/en/docs-5.3.3/) | Nov 17, 2022 |
| [5.3.2](https://rocm.docs.amd.com/en/docs-5.3.2/) | Nov 9, 2022 |
| [5.3.0](https://rocm.docs.amd.com/en/docs-5.3.0/) | Oct 4, 2022 |
| [5.2.3](https://rocm.docs.amd.com/en/docs-5.2.3/) | Aug 18, 2022 |
| [5.2.1](https://rocm.docs.amd.com/en/docs-5.2.1/) | Jul 21, 2022 |
| [5.2.0](https://rocm.docs.amd.com/en/docs-5.2.0/) | Jun 28, 2022 |
| [5.1.3](https://rocm.docs.amd.com/en/docs-5.1.3/) | May 20, 2022 |
| [5.1.1](https://rocm.docs.amd.com/en/docs-5.1.1/) | Apr 8, 2022 |
| [5.1.0](https://rocm.docs.amd.com/en/docs-5.1.0/) | Mar 30, 2022 |
| [5.0.2](https://rocm.docs.amd.com/en/docs-5.0.2/) | Mar 4, 2022 |
| [5.0.1](https://rocm.docs.amd.com/en/docs-5.0.1/) | Feb 16, 2022 |
| [5.0.0](https://rocm.docs.amd.com/en/docs-5.0.0/) | Feb 9, 2022 |

View File

@@ -4,7 +4,7 @@
## Supported SKUs
AMD ROCm™ Platform supports the following Windows SKU.
AMD HIP SDK supports the following Windows variants.
| Distribution |Processor Architectures| Validated update |
|---------------------|-----------------------|--------------------|
@@ -12,7 +12,11 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Windows 11 | x86-64 | 22H2 (GA) |
| Windows Server 2022 | x86-64 | |
## GPU Support Table
## Windows Supported GPUs
The table below shows supported GPUs for Radeon Pro™ and Radeon™ GPUs. Please
click the tabs below to switch between GPU product lines. If a GPU is not listed
on this table, the GPU is not officially supported by AMD.
::::{tab-set}
@@ -21,12 +25,12 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon Pro W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon Pro W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon Pro W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
| AMD Radeon Pro VII | GCN5.1 | gfx906 | ❌ | ❌ |
| AMD Radeon Pro W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon Pro W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon Pro W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
| AMD Radeon Pro VII | GCN5.1 | gfx906 | ❌ | ❌ |
:::
@@ -36,13 +40,18 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | | |
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | | |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | | |
| AMD Radeon™ RX 7600 | RDNA3 | gfx1102 | | |
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6900 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6750 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6700 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6700 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | | ❌ |
| AMD Radeon™ RX 6650 XT | RDNA2 | gfx1032 | | ❌ |
| AMD Radeon™ RX 6600 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | ✅ | ❌ |
:::

View File

@@ -1,27 +1,31 @@
# What is ROCm?
ROCm is an open-source stack for GPU computation. ROCm is primarily Open-Source
Software (OSS) that allows developers the freedom to customize and tailor their
GPU software for their own needs while collaborating with a community of other
developers, and helping each other find solutions in an agile, flexible, rapid
and secure manner.
ROCm is an open-source stack, composed primarily of open-source software (OSS), designed for
graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development
tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
ROCm is a collection of drivers, development tools and APIs enabling GPU
programming from the low-level kernel to end-user applications. ROCm is powered
by AMDs Heterogeneous-computing Interface for Portability (HIP), an OSS C++ GPU
programming environment and its corresponding runtime. HIP allows ROCm
developers to create portable applications on different platforms by deploying
code on a range of platforms, from dedicated gaming GPUs to exascale HPC
clusters. ROCm supports programming models such as OpenMP and OpenCL, and
includes all the necessary OSS compilers, debuggers and libraries. ROCm is fully
integrated into ML frameworks such as PyTorch and TensorFlow. ROCm can be
deployed in many ways, including through the use of containers such as Docker,
Spack, and your own build from source.
With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
ROCms goal is to allow our users to maximize their GPU hardware investment.
ROCm is designed to help develop, test and deploy GPU accelerated HPC, AI,
scientific computing, CAD, and other applications in a free, open-source,
integrated and secure software ecosystem.
ROCm is powered by AMDs
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
an OSS C++ GPU programming environment and its corresponding runtime. HIP allows ROCm
developers to create portable applications on different platforms by deploying code on a range of
platforms, from dedicated gaming GPUs to exascale HPC clusters.
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary OSS
compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks,
such as PyTorch and TensorFlow.
## Radeon Software for Linux with ROCm
Starting with Radeon Software for Linux® 23.20.00.48 with ROCm 5.7, researchers and developers working with Machine Learning (ML) models and algorithms can tap into the parallel computing power of the AMD desktop GPUs based on the RDNA™ 3 architecture.
A client solution built on powerful high-end AMD GPUs provides a local, private and often cost-effective workflow to develop ROCm and train ML (PyTorch) for the users who previously relied solely on cloud-based solutions.
For information about how to install ROCm on AMD desktop GPUs based on the RDNA™ 3 architecture, see {doc}`Use ROCm on Radeon GPUs<radeon:index>`. For more information about supported AMD Radeon™ desktop GPUs, see {doc}`Radeon Compatibility Matrices <radeon:docs/compatibility>`.
## ROCm on Windows

View File

@@ -74,13 +74,14 @@ subtrees:
title: Changelog
- file: release/gpu_os_support
- file: release/windows_support
- file: release/versions
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
title: Known Issues
- file: release/compatibility
subtrees:
- entries:
- file: release/user_kernel_space_compat_matrix
- file: release/docker_image_support_matrix
- file: release/docker_image_support_matrix.rst
- file: release/3rd_party_support_matrix
- file: release/licensing
@@ -101,7 +102,7 @@ subtrees:
- entries:
- file: reference/gpu_libraries/linear_algebra
subtrees:
- entries:
- entries:
- title: rocBLAS
url: ${project:rocblas}
- title: hipBLAS
@@ -120,6 +121,8 @@ subtrees:
url: ${project:rocsparse}
- title: hipSPARSE
url: ${project:hipsparse}
- title: hipSPARSELt
url: ${project:hipsparselt}
- file: reference/gpu_libraries/fft
subtrees:
- entries:
@@ -140,12 +143,15 @@ subtrees:
- entries:
- title: rocPRIM
url: ${project:rocprim}
- entries:
- title: rocThrust
url: ${project:rocthrust}
- entries:
- title: hipCUB
url: ${project:hipcub}
- entries:
- title: rocThrust
url: ${project:rocthrust}
- title: hipTensor
url: ${project:hiptensor}
- file: reference/gpu_libraries/communication
title: Communication Libraries
subtrees:
@@ -181,9 +187,9 @@ subtrees:
- url: ${project:rocgdb}
title: ROCgdb
- url: ${project:rocprofiler}
title: rocprofiler
title: ROCProfiler
- url: ${project:roctracer}
title: roctracer
title: ROCTracer
- url: ${project:rocdbgapi}
title: ROCdbgapi
- file: reference/management_tools
@@ -217,8 +223,12 @@ subtrees:
- entries:
- file: understand/gpu_arch/mi250
title: MI250
- file: understand/gpu_arch/mi200_performance_counters
title: MI200 Performance Counters and Metrics
- file: understand/gpu_arch/mi100
title: MI100
- file: understand/using_gpu_sanitizer
title: Using GPU Sanitizer
- file: understand/More-about-how-ROCm-uses-PCIe-Atomics
- caption: How to Guides
entries:
@@ -258,3 +268,8 @@ subtrees:
entries:
- file: about
- file: contributing
subtrees:
- entries:
- file: contribute/building.md
- file: contribute/feedback.md
- file: license.md

View File

@@ -1 +1,2 @@
rocm-docs-core==0.18.3
rocm-docs-core==1.8.0
sphinx-reredirects

View File

@@ -1,118 +1,106 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --resolver=backtracking requirements.in
# pip-compile requirements.in
#
accessible-pygments==0.0.3
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==0.7.13
alabaster==1.0.0
# via sphinx
babel==2.11.0
babel==2.16.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.11.2
beautifulsoup4==4.12.3
# via pydata-sphinx-theme
breathe==4.34.0
breathe==4.35.0
# via rocm-docs-core
certifi==2022.12.7
certifi==2024.8.30
# via requests
cffi==1.15.1
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==2.1.1
charset-normalizer==3.3.2
# via requests
click==8.1.3
click==8.1.7
# via sphinx-external-toc
colorama==0.4.6
# via
# click
# sphinx
cryptography==40.0.2
cryptography==43.0.1
# via pyjwt
deprecated==1.2.13
deprecated==1.2.14
# via pygithub
docutils==0.19
docutils==0.21.2
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.16.3
fastjsonschema==2.20.0
# via rocm-docs-core
gitdb==4.0.10
gitdb==4.0.11
# via gitpython
gitpython==3.1.30
gitpython==3.1.43
# via rocm-docs-core
idna==3.4
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
importlib-metadata==6.7.0
# via sphinx
importlib-resources==5.12.0
# via rocm-docs-core
jinja2==3.1.2
jinja2==3.1.4
# via
# myst-parser
# sphinx
linkify-it-py==1.0.3
# via myst-parser
markdown-it-py==2.2.0
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.2
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.3.4
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser[linkify]==1.0.0
myst-parser==4.0.0
# via rocm-docs-core
packaging==23.0
packaging==24.1
# via
# pydata-sphinx-theme
# sphinx
pycparser==2.21
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.13.3
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==1.58.1
pygithub==2.4.0
# via rocm-docs-core
pygments==2.14.0
pygments==2.18.0
# via
# accessible-pygments
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.6.0
pyjwt[crypto]==2.9.0
# via pygithub
pynacl==1.5.0
# via pygithub
pytz==2022.7.1
# via babel
pyyaml==6.0
pyyaml==6.0.2
# via
# myst-parser
# rocm-docs-core
# sphinx-external-toc
requests==2.28.1
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core==0.18.3
rocm-docs-core==1.8.0
# via -r requirements.in
smmap==5.0.0
smmap==5.0.1
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.4
soupsieve==2.6
# via beautifulsoup4
sphinx==5.3.0
sphinx==8.0.2
# via
# breathe
# myst-parser
@@ -123,37 +111,40 @@ sphinx==5.3.0
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
sphinx-book-theme==1.0.1
# sphinx-reredirects
sphinx-book-theme==1.1.3
# via rocm-docs-core
sphinx-copybutton==0.5.1
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.4.1
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==0.3.1
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==0.8.3
sphinx-notfound-page==1.0.4
# via rocm-docs-core
sphinxcontrib-applehelp==1.0.4
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
typing-extensions==4.5.0
# via pydata-sphinx-theme
uc-micro-py==1.0.1
# via linkify-it-py
urllib3==1.26.13
# via requests
wrapt==1.14.1
# via deprecated
zipp==3.15.0
tomli==2.0.1
# via sphinx
typing-extensions==4.12.2
# via
# importlib-metadata
# importlib-resources
# pydata-sphinx-theme
# pygithub
urllib3==2.2.3
# via
# pygithub
# requests
wrapt==1.16.0
# via deprecated

View File

@@ -13,7 +13,7 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
For ROCm the Platform atomics are used in ROCm in the following ways:
@@ -22,17 +22,17 @@ For ROCm the Platform atomics are used in ROCm in the following ways:
* Update HSA queues write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
* Update HSA Signals 64bit atomic ops are used for CPU & GPU synchronization.
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
To understand more about how PCIe Atomic operations work `PCIe Atomics <https://pcisig.com/sites/default/files/specification_documents/ECN_Atomic_Ops_080417.pdf>`_
To understand more about how PCIe Atomic operations work `PCIe Atomics <https://pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf>`_
`Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/patch/7261731/>`_
`Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/project/linux-pci/patch/1443110390-4080-1-git-send-email-jay@jcornwall.me/>`_
There are also a number of papers which talk about these new capabilities:
@@ -50,7 +50,7 @@ Other I/O devices with PCIe Atomics support
Future bus technology with richer I/O Atomics Operation Support
* `GenZ <http://genzconsortium.org/faq/gen-z-technology/#33/>`_
* GenZ
New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPUs with PCIe Generation 3.0 support.
@@ -65,13 +65,11 @@ In ROCm, we also take advantage of PCIe ID based ordering technology for P2P whe
They are routed off to different ends of the computer but we want to make sure the write to system memory to indicate transfer complete occurs AFTER P2P write to GPU has complete.
`Good Paper on Understanding PCIe Generation 3 Throughput <https://www.altera.com/en_US/pdfs/literature/an/an690.pdf>`_
BAR Memory Overview
*******************
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
In SuperMicro system in the system bios you need to see the following
In Supermicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
@@ -79,7 +77,7 @@ In SuperMicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
@@ -118,30 +116,5 @@ Legend:
5 : Expansion ROM This is required for the AMD Driver SW to access the GPUs video-bios. This is currently fixed at 128KB.
Excepts form Overview of Changes to PCI Express 3.0
===================================================
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
********************************************************
Atomic Operations Goal:
*************************
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
* Fetch and Add uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
* Unconditional Swap uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
* Compare and Swap uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
* AtomicOpCompletion new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
ID-based Ordering Goal:
*************************
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
For more information, you can review
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.

View File

@@ -4,7 +4,7 @@ Using CMake
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of CMake's HIP language support will require CMake 3.21 or higher.
to make use of the CMake HIP language support will require CMake 3.21 or higher.
Finding Dependencies
====================
@@ -16,7 +16,7 @@ Finding Dependencies
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake's related facilities.
to get an overview of CMake related facilities.
In short, CMake supports finding dependencies in two ways:
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
regards needed to consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on Nvidia runtimes. As such, when
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
Using HIP in CMake
==================
ROCm componenents providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
ROCm components providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with for eg.
This project can then be configured with for the following CMake commands:
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake Packages
-------------------
@@ -251,9 +251,9 @@ options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
Continuous Integration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command-line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_

View File

@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -21,7 +21,7 @@ fabric.
<img src="../../data/reference/gpu_arch/image.004.png" alt="Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.">
Structure of a single GCD in the AMD Instinct MI250 accelerator.
Structure of a single GCD in the AMD Instinct MI100 accelerator.
:::
In a typical node configuration, each processor can host up to four AMD

View File

@@ -0,0 +1,457 @@
# MI200 Performance Counters and Metrics
<!-- markdownlint-disable no-duplicate-header -->
This document lists and describes the hardware performance counters and the derived metrics available on the AMD Instinct™ MI200 GPU. All hardware performance monitors, and the derived performance metrics are accessible via AMD ROCm™ Profiler tool.
## MI200 Performance Counters List
:::{note}
Preliminary validation of all MI200 performance counters is in progress. Those with “[*]” appended to the names require further evaluation.
:::
### Graphics Register Bus Management (GRBM)
#### GRBM Counters
| Hardware Counter | Unit | Definition |
|--------------------|--------| --------------------------------------------------------------------------|
| `grbm_count` | Cycles | Free-running GPU clock |
| `grbm_gui_active` | Cycles | GPU active cycles |
| `grbm_cp_busy` | Cycles | Any of the CP (CPC/CPF) blocks are busy. |
| `grbm_spi_busy` | Cycles | Any of the Shader Processor Input (SPI) are busy in the shader engine(s). |
| `grbm_ta_busy` | Cycles | Any of the Texture Addressing Unit (TA) are busy in the shader engine(s). |
| `grbm_tc_busy` | Cycles | Any of the Texture Cache Blocks (TCP/TCI/TCA/TCC) are busy. |
| `grbm_cpc_busy` | Cycles | The Command Processor - Compute (CPC) is busy. |
| `grbm_cpf_busy` | Cycles | The Command Processor - Fetcher (CPF) is busy. |
| `grbm_utcl2_busy` | Cycles | The Unified Translation Cache - Level 2 (UTCL2) block is busy. |
| `grbm_ea_busy` | Cycles | The Efficiency Arbiter (EA) block is busy. |
### Command Processor (CP)
The command processor counters are further classified into fetcher and compute.
#### Command Processor - Fetcher (CPF)
##### CPF Counters
| Hardware Counter | Unit | Definition |
|--------------------------------------|--------|--------------------------------------------------------------|
| `cpf_cmp_utcl1_stall_on_translation` | Cycles | One of the Compute UTCL1s is stalled waiting on translation. |
| `cpf_cpf_stat_idle[]` | Cycles | CPF idle |
| `cpf_cpf_stat_stall` | Cycles | CPF stall |
| `cpf_cpf_tciu_busy` | Cycles | CPF TCIU interface busy |
| `cpf_cpf_tciu_idle` | Cycles | CPF TCIU interface idle |
| `cpf_cpf_tciu_stall[]` | Cycles | CPF TCIU interface is stalled waiting on free tags. |
#### Command Processor - Compute (CPC)
##### CPC Counters
| Hardware Counter | Unit | Definition |
| ---------------------------------| -------| --------------------------------------------------- |
| `cpc_me1_busy_for_packet_decode` | Cycles | CPC ME1 busy decoding packets |
| `cpc_utcl1_stall_on_translation` | Cycles | One of the UTCL1s is stalled waiting on translation |
| `cpc_cpc_stat_busy` | Cycles | CPC busy |
| `cpc_cpc_stat_idle` | Cycles | CPC idle |
| `cpc_cpc_stat_stall` | Cycles | CPC stalled |
| `cpc_cpc_tciu_busy` | Cycles | CPC TCIU interface busy |
| `cpc_cpc_tciu_idle` | Cycles | CPC TCIU interface idle |
| `cpc_cpc_utcl2iu_busy` | Cycles | CPC UTCL2 interface busy |
| `cpc_cpc_utcl2iu_idle` | Cycles | CPC UTCL2 interface idle |
| `cpc_cpc_utcl2iu_stall[]` | Cycles | CPC UTCL2 interface stalled waiting |
| `cpc_me1_dci0_spi_busy` | Cycles | CPC ME1 Processor busy |
### Shader Processor Input (SPI)
#### SPI Counters
| Hardware Counter | Unit | Definition |
| :----------------------------| :-----------| -----------------------------------------------------------: |
| `spi_csn_busy` | Cycles | Number of clocks with outstanding waves |
| `spi_csn_window_valid` | Cycles | Clock count enabled by perfcounter_start event |
| `spi_csn_num_threadgroups` | Workgroups | Total number of dispatched workgroups |
| `spi_csn_wave` | Wavefronts | Total number of dispatched wavefronts |
| `spi_ra_req_no_alloc` | Cycles | Arb cycles with requests but no allocation (need to multiply this value by 4) |
|`spi_ra_req_no_alloc_csn` | Cycles | Arb cycles with CSn req and no CSn alloc (need to multiply this value by 4) |
| `spi_ra_res_stall_csn` | Cycles | Arb cycles with CSn req and no CSn fits (need to multiply this value by 4) |
| `spi_ra_tmp_stall_csn[]` | Cycles | Cycles where CSn wants to req but does not fit in temp space |
| `spi_ra_wave_simd_full_csn` | SIMD-cycles | Sum of SIMD where WAVE cannot take csn wave when not fits |
| `spi_ra_vgpr_simd_full_csn[]` | SIMD-cycles | Sum of SIMD where VGPR cannot take csn wave when not fits |
| `spi_ra_sgpr_simd_full_csn[]` | SIMD-cycles | Sum of SIMD where SGPR cannot take csn wave when not fits |
| `spi_ra_lds_cu_full_csn` | CUs | Sum of CU where LDS cannot take csn wave when not fits |
| `spi_ra_bar_cu_full_csn[]` | CUs | Sum of CU where BARRIER cannot take csn wave when not fits |
| `spi_ra_bulky_cu_full_csn[]` | CUs | Sum of CU where BULKY cannot take csn wave when not fits |
| `spi_ra_tglim_cu_full_csn[]` | Cycles | Cycles where csn wants to req but all CUs are at `tg_limit` |
| `spi_ra_wvlim_cu_full_csn[]` | Cycles | Number of clocks csn is stalled due to WAVE LIMIT |
| `spi_vwc_csc_wr` | Cycles | Number of clocks to write CSC waves to VGPRs (need to multiply this value by 4) |
| `spi_swc_csc_wr` | Cycles | Number of clocks to write CSC waves to SGPRs (need to multiply this value by 4) |
### Compute Unit
The compute unit counters are further classified into instruction mix, MFMA operation counters, level counters, wavefront counters, wavefront cycle counters, local data share counters, and others.
#### Instruction Mix
| Hardware Counter | Unit | Definition |
| :-----------------------| :-----:| -----------------------------------------------------------------------: |
| `sq_insts` | Instr | Number of instructions issued |
| `sq_insts_valu` | Instr | Number of VALU instructions issued, including MFMA |
| `sq_insts_valu_add_f16` | Instr | Number of VALU F16 Add instructions issued |
| `sq_insts_valu_mul_f16` | Instr | Number of VALU F16 Multiply instructions issued |
| `sq_insts_valu_fma_f16` | Instr | Number of VALU F16 FMA instructions issued |
| `sq_insts_valu_trans_f16` | Instr | Number of VALU F16 Transcendental instructions issued |
| `sq_insts_valu_add_f32` | Instr | Number of VALU F32 Add instructions issued |
| `sq_insts_valu_mul_f32` | Instr | Number of VALU F32 Multiply instructions issued |
| `sq_insts_valu_fma_f32` | Instr | Number of VALU F32 FMA instructions issued |
| `sq_insts_valu_trans_f32` | Instr | Number of VALU F32 Transcendental instructions issued |
| `sq_insts_valu_add_f64` | Instr | Number of VALU F64 Add instructions issued |
| `sq_insts_valu_mul_f64` | Instr | Number of VALU F64 Multiply instructions issued |
| `sq_insts_valu_fma_f64` | Instr | Number of VALU F64 FMA instructions issued |
| `sq_insts_valu_trans_f64` | Instr | Number of VALU F64 Transcendental instructions issued |
| `sq_insts_valu_int32` | Instr | Number of VALU 32-bit integer instructions issued (signed or unsigned) |
| `sq_insts_valu_int64` | Instr | Number of VALU 64-bit integer instructions issued (signed or unsigned) |
| `sq_insts_valu_cvt` | Instr | Number of VALU Conversion instructions issued |
| `sq_insts_valu_mfma_i8` | Instr | Number of 8-bit Integer MFMA instructions issued |
| `sq_insts_valu_mfma_f16` | Instr | Number of F16 MFMA instructions issued |
| `sq_insts_valu_mfma_bf16` | Instr | Number of BF16 MFMA instructions issued |
| `sq_insts_valu_mfma_f32` | Instr | Number of F32 MFMA instructions issued |
| `sq_insts_valu_mfma_f64` | Instr | Number of F64 MFMA instructions issued |
| `sq_insts_mfma` | Instr | Number of MFMA instructions issued |
| `sq_insts_vmem_wr` | Instr | Number of VMEM Write instructions issued |
| `sq_insts_vmem_rd` | Instr | Number of VMEM Read instructions issued |
| `sq_insts_vmem` | Instr | Number of VMEM instructions issued, including both FLAT and Buffer instructions |
| `sq_insts_salu` | Instr | Number of SALU instructions issued |
| `sq_insts_smem` | Instr | Number of SMEM instructions issued |
| `sq_insts_smem_norm` | Instr | Number of SMEM instructions issued to normalize to match `smem_level`. Used in measuring SMEM latency |
| `sq_insts_flat` | Instr | Number of FLAT instructions issued |
| `sq_insts_flat_lds_only` | Instr | Number of FLAT instructions issued that read/write only from/to LDS |
| `sq_insts_lds` | Instr | Number of LDS instructions issued |
| `sq_insts_gds` | Instr | Number of GDS instructions issued |
| `sq_insts_exp_gds` | Instr | Number of EXP and GDS instructions excluding skipped export instructions issued |
| `sq_insts_branch` | Instr | Number of Branch instructions issued |
| `sq_insts_sendmsg` | Instr | Number of SENDMSG instructions including s_endpgm issued |
| `sq_insts_vskipped[]` | Instr | Number of VSkipped instructions issued |
#### MFMA Operation Counters
| Hardware Counter | Unit | Definition |
| :----------------------------| :-----| ----------------------------------------------: |
| `sq_insts_valu_mfma_mops_I8` | IOP | Number of 8-bit integer MFMA ops in unit of 512 |
| `sq_insts_valu_mfma_mops_F16` | FLOP | Number of F16 floating MFMA ops in unit of 512 |
| `sq_insts_valu_mfma_mops_BF16` | FLOP | Number of BF16 floating MFMA ops in unit of 512 |
| `sq_insts_valu_mfma_mops_F32` | FLOP | Number of F32 floating MFMA ops in unit of 512 |
| `sq_insts_valu_mfma_mops_F64` | FLOP | Number of F64 floating MFMA ops in unit of 512 |
#### Level Counters
| Hardware Counter | Unit | Definition |
| :-------------------| :-----| -------------------------------------: |
| `sq_accum_prev` | Count | Accumulated counter sample value where accumulation takes place once every four cycles |
| `sq_accum_prev_hires` | Count | Accumulated counter sample value where accumulation takes place once every cycle |
| `sq_level_waves` | Waves | Number of inflight waves |
| `sq_insts_level_vmem` | Instr | Number of inflight VMEM instructions |
| `sq_insts_level_smem` | Instr | Number of inflight SMEM instructions |
| `sq_insts_level_lds` | Instr | Number of inflight LDS instructions |
| `sq_ifetch_level` | Instr | Number of inflight instruction fetches |
#### Wavefront Counters
| Hardware Counter | Unit | Definition |
| :--------------------| :-----| ----------------------------------------------------------------: |
| `sq_waves` | Waves | Number of wavefronts dispatch to SQs, including both new and restored wavefronts |
| `sq_waves_saved[]` | Waves | Number of context-saved wavefronts |
| `sq_waves_restored[]` | Waves | Number of context-restored wavefronts |
| `sq_waves_eq_64` | Waves | Number of wavefronts with exactly 64 active threads sent to SQs |
| `sq_waves_lt_64` | Waves | Number of wavefronts with less than 64 active threads sent to SQs |
| `sq_waves_lt_48` | Waves | Number of wavefronts with less than 48 active threads sent to SQs |
| `sq_waves_lt_32` | Waves | Number of wavefronts with less than 32 active threads sent to SQs |
| `sq_waves_lt_16` | Waves | Number of wavefronts with less than 16 active threads sent to SQs |
#### Wavefront Cycle Counters
| Hardware Counter | Unit | Definition |
| :------------------------| :-------| --------------------------------------------------------------------: |
| `sq_cycles` | Cycles | Free-running SQ clocks |
| `sq_busy_cycles` | Cycles | Number of cycles while SQ reports it to be busy |
| `sq_busy_cu_cycles` | Qcycles | Number of quad-cycles each CU is busy |
| `sq_valu_mfma_busy_cycles` | Cycles | Number of cycles the MFMA ALU is busy |
| `sq_wave_cycles` | Qcycles | Number of quad-cycles spent by waves in the CUs |
| `sq_wait_any` | Qcycles | Number of quad-cycles spent waiting for anything |
| `sq_wait_inst_any` | Qcycles | Number of quad-cycles spent waiting for an issued instruction |
| `sq_active_inst_any` | Qcycles | Number of quad-cycles spent by each wave to work on an instruction |
| `sq_active_inst_vmem` | Qcycles | Number of quad-cycles spent by each wave to work on a non-FLAT VMEM instruction |
| `sq_active_inst_lds` | Qcycles | Number of quad-cycles spent by each wave to work on an LDS instruction |
| `sq_active_inst_valu` | Qcycles | Number of quad-cycles spent by each wave to work on a VALU instruction |
| `sq_active_inst_sca` | Qcycles | Number of quad-cycles spent by each wave to work on an SCA instruction |
| `sq_active_inst_exp_gds` | Qcycles | Number of quad-cycles spent by each wave to work on EXP or GDS instruction |
| `sq_active_inst_misc` | Qcycles | Number of quad-cycles spent by each wave to work on an MISC instruction, including branch and sendmsg |
| `sq_active_inst_flat` | Qcycles | Number of quad-cycles spent by each wave to work on a FLAT instruction |
| `sq_inst_cycles_vmem_wr` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Write instructions, including both FLAT and Buffer |
| `sq_inst_cycles_vmem_rd` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Read instructions, including both FLAT and Buffer |
| `sq_inst_cycles_smem` | Qcycles | Number of quad-cycles spent to execute scalar memory reads |
| `sq_inst_cycles_salu` | Cycles | Number of cycles spent to execute non-memory read scalar operations |
| `sq_thread_cycles_valu` | Cycles | Number of thread-cycles spent to execute VALU operations |
#### Local Data Share
| Hardware Counter | Unit | Definition |
| :--------------------------| :------| --------------------------------------------------------: |
| `sq_lds_atomic_return` | Cycles | Number of atomic return cycles in LDS |
| `sq_lds_bank_conflict` | Cycles | Number of cycles LDS is stalled by bank conflicts |
| `sq_lds_addr_conflict[]` | Cycles | Number of cycles LDS is stalled by address conflicts |
| `sq_lds_unaligned_stalls[]` | Cycles | Number of cycles LDS is stalled processing flat unaligned load/store ops |
| `sq_lds_mem_violations[]` | Count | Number of threads that have a memory violation in the LDS |
#### Miscellaneous
##### Local Data Share
| Hardware Counter | Unit | Definition |
| :----------------| :-------| --------------------------------------------------------: |
| `sq_ifetch` | Count | Number of fetch requests from L1I cache, in 32-byte width |
| `sq_items` | Threads | Number of valid threads |
### L1I and sL1D Caches
#### L1I and sL1D Caches
| Hardware Counter | Unit | Definition |
| :----------------------------| :------| ----------------------------------------------------------------: |
| `sqc_icache_req` | Req | Number of L1I cache requests |
| `sqc_icache_hits` | Count | Number of L1I cache lookup-hits |
| `sqc_icache_misses` | Count | Number of L1I cache non-duplicate lookup-misses |
| `sqc_icache_misses_duplicate` | Count | Number of d L1I cache duplicate lookup misses whose previous lookup miss on the same cache line is not fulfilled yet |
| `sqc_dcache_req` | Req | Number of sL1D cache requests |
| `sqc_dcache_input_valid_readb` | Cycles | Number of cycles while SQ input is valid but sL1D cache is not ready |
| `sqc_dcache_hits` | Count | Number of sL1D cache lookup-hits |
| `sqc_dcache_misses` | Count | Number of sL1D non-duplicate lookup-misses |
| `sqc_dcache_misses_duplicate` | Count | Number of sL1D duplicate lookup-misses |
| `sqc_dcache_req_read_1` | Req | Number of Read requests in a single 32-bit Data Word, DWORD (DW) |
| `sqc_dcache_req_read_2` | Req | Number of Read requests in 2 DW |
| `sqc_dcache_req_read_4` | Req | Number of Read requests in 4 DW |
| `sqc_dcache_req_read_8` | Req | Number of Read requests in 8 DW |
| `sqc_dcache_req_read_16` | Req | Number of Read requests in 16 DW |
| `sqc_dcache_atomic[]` | Req | Number of Atomic requests |
| `sqc_tc_req` | Req | Number of L2 cache requests that were issued by instruction and constant caches |
| `sqc_tc_inst_req` | Req | Number of instruction cache line requests to L2 cache |
| `sqc_tc_data_read_req` | Req | Number of data Read requests to the L2 cache |
| `sqc_tc_data_write_req[]` | Req | Number of data Write requests to the L2 cache |
| `sqc_tc_data_atomic_req[]` | Req | Number of data Atomic requests to the L2 cache |
| `sqc_tc_stall[]` | Cycles | Number of cycles while the valid requests to L2 Cache are stalled |
### Vector L1 Cache Subsystem
The vector L1 cache subsystem counters are further classified into texture addressing unit, texture data unit, vector L1D cache, and texture cache arbiter.
#### Texture Addressing Unit
##### Texture Addressing Unit Counters
| Hardware Counter | Unit | Definition |
| :--------------------------------| :------| ------------------------------------------------: |
| `ta_ta_busy` | Cycles | TA busy cycles |
| `ta_total_wavefronts` | Instr | Number of wavefront instructions |
| `ta_buffer_wavefronts` | Instr | Number of Buffer wavefront instructions |
| `ta_buffer_read_wavefronts` | Instr | Number of Buffer Read wavefront instructions |
| `ta_buffer_write_wavefronts` | Instr | Number of Buffer Write wavefront instructions |
| `ta_buffer_atomic_wavefronts[]` | Instr | Number of Buffer Atomic wavefront instructions |
| `ta_buffer_total_cycles` | Cycles | Number of Buffer cycles, including Read and Write |
| `ta_buffer_coalesced_read_cycles` | Cycles | Number of coalesced Buffer read cycles |
| `ta_buffer_coalesced_write_cycles` | Cycles | Number of coalesced Buffer write cycles |
| `ta_addr_stalled_by_tc` | Cycles | Number of cycles TA address is stalled by TCP |
| `ta_data_stalled_by_tc` | Cycles | Number of cycles TA data is stalled by TCP |
| `ta_addr_stalled_by_td_cycles[]` | Cycles | Number of cycles TA address is stalled by TD |
| `ta_flat_wavefronts` | Instr | Number of Flat wavefront instructions |
| `ta_flat_read_wavefronts` | Instr | Number of Flat Read wavefront instructions |
| `ta_flat_write_wavefronts` | Instr | Number of Flat Write wavefront instructions |
| `ta_flat_atomic_wavefronts` | Instr | Number of Flat Atomic wavefront instructions |
#### Texture Data Unit
##### Texture Data Unit Counters
| Hardware Counter | Unit | Definition |
| :------------------------| :-----| ---------------------------------------------------: |
| `td_td_busy` | Cycle | TD busy cycles |
| `td_tc_stall` | Cycle | Number of cycles TD is stalled by TCP |
| `td_spi_stall[]` | Cycle | Number of cycles TD is stalled by SPI |
| `td_load_wavefront` | Instr | Number of wavefront instructions (Read/Write/Atomic) |
| `td_store_wavefront` | Instr | Number of Write wavefront instructions |
| `td_atomic_wavefront` | Instr | Number of Atomic wavefront instructions |
| `td_coalescable_wavefront` | Instr | Number of coalescable instructions |
#### Vector L1D Cache
| Hardware Counter | Unit | Definition |
| :-----------------------------------| :------| ----------------------------------------------------------: |
| `tcp_gate_en1` | Cycles | Number of cycles/ vL1D interface clocks are turned on |
| `tcp_gate_en2` | Cycles | Number of cycles vL1D core clocks are turned on |
| `tcp_td_tcp_stall_cycles` | Cycles | Number of cycles TD stalls vL1D |
| `tcp_tcr_tcp_stall_cycles` | Cycles | Number of cycles TCR stalls vL1D |
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Read |
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Write |
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on an Atomic |
| `tcp_pending_stall_cycles` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache |
| `tcp_ta_tcp_state_read` | Req | Number of wavefront instruction requests to vL1D |
| `tcp_volatile[]` | Req | Number of L1 volatile pixels/buffers from TA |
| `tcp_total_accesses` | Req | Number of vL1D accesses |
| `tcp_total_read` | Req | Number of vL1D Read accesses |
| `tcp_total_write` | Req | Number of vL1D Write accesses |
| `tcp_total_atomic_with_ret` | Req | Number of vL1D Atomic with return |
| `tcp_total_atomic_without_ret` | Req | Number of vL1D Atomic without return |
| `tcp_total_writeback_invalidates` | Count | Number of vL1D Writebacks and Invalidates |
| `tcp_utcl1_request` | Req | Number of address translation requests to UTCL1 |
| `tcp_utcl1_translation_hit` | Req | Number of UTCL1 translation hits |
| `tcp_utcl1_translation_miss` | Req | Number of UTCL1 translation misses |
| `tcp_utcl1_persmission_miss` | Req | Number of UTCL1 permission misses |
| `tcp_total_cache_accesses` | Req | Number of vL1D cache accesses |
| `tcp_tcp_latency` | Cycles | Accumulated wave access latency to vL1D over all wavefronts |
| `tcp_tcc_read_req_latency` | Cycles | Accumulated vL1D-L2 request latency over all wavefronts for Reads and Atomics with return |
| `tcp_tcc_write_req_latency` | Cycles | Accumulated vL1D-L2 request latency over all wavefronts for Writes and Atomics without return |
| `tcp_tcc_read_req` | Req | Number of Read requests to L2 Cache |
| `tcp_tcc_write_req` | Req | Number of Write requests to L2 Cache |
| `tcp_tcc_atomic_with_ret_req` | Req | Number of Atomic requests to L2 Cache with return |
| `tcp_tcc_atomic_without_ret_req` | Req | Number of Atomic requests to L2 Cache without return |
| `tcp_tcc_nc_read_req` | Req | Number of NC Read requests to L2 Cache |
| `tcp_tcc_uc_read_req` | Req | Number of UC Read requests to L2 Cache |
| `tcp_tcc_cc_read_req` | Req | Number of CC Read requests to L2 Cache |
| `tcp_tcc_rw_read_req` | Req | Number of RW Read requests to L2 Cache |
| `tcp_tcc_nc_write_req` | Req | Number of NC Write requests to L2 Cache |
| `tcp_tcc_uc_write_req` | Req | Number of UC Write requests to L2 Cache |
| `tcp_tcc_cc_write_req` | Req | Number of CC Write requests to L2 Cache |
| `tcp_tcc_rw_write_req` | Req | Number of RW Write requests to L2 Cache |
| `tcp_tcc_nc_atomic_req` | Req | Number of NC Atomic requests to L2 Cache |
| `tcp_tcc_uc_atomic_req` | Req | Number of UC Atomic requests to L2 Cache |
| `tcp_tcc_cc_atomic_req` | Req | Number of CC Atomic requests to L2 Cache |
| `tcp_tcc_rw_atomic_req` | Req | Number of RW Atomic requests to L2 Cache |
#### Texture Cache Arbiter (TCA)
| Hardware Counter | Unit | Definition |
| :----------------| :------| ------------------------------------------: |
| `tca_cycle` | Cycles | TCA cycles |
| `tca_busy` | Cycles | Number of cycles TCA has a pending request |
### L2 Cache Access
#### L2 Cache Access Counters
| Hardware Counter | Unit | Definition |
| :--------------------------------| :------| -------------------------------------------------------------: |
| `tcc_cycle` |Cycle | L2 Cache free-running clocks |
| `tcc_busy` |Cycle | L2 Cache busy cycles |
| `tcc_req` |Req | Number of L2 Cache requests |
| `tcc_streaming_req[]` |Req | Number of L2 Cache Streaming requests |
| `tcc_NC_req` |Req | Number of NC requests |
| `tcc_UC_req` |Req | Number of UC requests |
| `tcc_CC_req` |Req | Number of CC requests |
| `tcc_RW_req` |Req | Number of RW requests |
| `tcc_probe` |Req | Number of L2 Cache probe requests |
| `tcc_probe_all[]` |Req | Number of external probe requests with `EA_TCC_preq_all== 1` |
| `tcc_read_req` |Req | Number of L2 Cache Read requests |
| `tcc_write_req` |Req | Number of L2 Cache Write requests |
| `tcc_atomic_req` |Req | Number of L2 Cache Atomic requests |
| `tcc_hit` |Req | Number of L2 Cache lookup-hits |
| `tcc_miss` |Req | Number of L2 cache lookup-misses |
| `tcc_writeback` |Req | Number of lines written back to main memory, including writebacks of dirty lines and uncached Write/Atomic requests |
| `tcc_ea_wrreq` |Req | Total number of 32-byte and 64-byte Write requests to EA |
| `tcc_ea_wrreq_64B` |Req | Total number of 64-byte Write requests to EA |
| `tcc_ea_wr_uncached_32B` |Req | Number of 32-byte Write/Atomic going over the TC_EA_wrreq interface due to uncached traffic. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. |
| `tcc_ea_wrreq_stall` | Cycles | Number of cycles a Write request was stalled |
| `tcc_ea_wrreq_io_credit_stall[]` | Cycles | Number of cycles an EA Write request runs out of IO credits |
| `tcc_ea_wrreq_gmi_credit_stall[]` | Cycles | Number of cycles an EA Write request runs out of GMI credits |
| `tcc_ea_wrreq_dram_credit_stall` | Cycles | Number of cycles an EA Write request runs out of DRAM credits |
| `tcc_too_many_ea_wrreqs_stall[]` | Cycles | Number of cycles the L2 Cache reaches maximum number of pending EA Write requests |
| `tcc_ea_wrreq_level` | Req | Accumulated number of L2 Cache-EA Write requests in flight |
| `tcc_ea_atomic` | Req | Number of 32-byte and 64-byte Atomic requests to EA |
| `tcc_ea_atomic_level` | Req | Accumulated number of L2 Cache-EA Atomic requests in flight |
| `tcc_ea_rdreq` | Req | Total number of 32-byte and 64-byte Read requests to EA |
| `tcc_ea_rdreq_32B` | Req | Total number of 32-byte Read requests to EA |
| `tcc_ea_rd_uncached_32B` | Req | Number of 32-byte L2 Cache-EA Read due to uncached traffic. A 64-byte request is counted as 2. |
| `tcc_ea_rdreq_io_credit_stall[]` | Cycles | Number of cycles Read request interface runs out of IO credits |
| `tcc_ea_rdreq_gmi_credit_stall[]` | Cycles | Number of cycles Read request interface runs out of GMI credits |
| `tcc_ea_rdreq_dram_credit_stall` | Cycles | Number of cycles Read request interface runs out of DRAM credits |
| `tcc_ea_rdreq_level` | Req | Accumulated number of L2 Cache-EA Read requests in flight |
| `tcc_ea_rdreq_dram` | Req | Number of 32-byte and 64-byte Read requests to HBM |
| `tcc_ea_wrreq_dram` | Req | Number of 32-byte and 64-byte Write requests to HBM |
| `tcc_tag_stall` | Cycles | Number of cycles the normal request pipeline in the tag was stalled for any reason |
| `tcc_normal_writeback` | Req | Number of L2 cache normal writeback |
| `tcc_all_tc_op_wb_writeback[]` | Req | Number of instruction-triggered writeback requests |
| `tcc_normal_evict` | Req | Number of L2 cache normal evictions |
| `tcc_all_tc_op_inv_evict[]` | Req | Number of instruction-triggered eviction requests |
## MI200 Derived Metrics List
### Derived Metrics on MI200 GPUs
| Derived Metric | Description |
| :----------------| -------------------------------------------------------------------------------------: |
| `VFetchInsts` | The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory |
| `VWriteInsts` | The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory |
| `FlatVMemInsts` | The average number of FLAT instructions that read from or write to the video memory executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch |
| `LDSInsts` | The average number of LDS read/write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS |
| `FlatLDSInsts` | The average number of FLAT instructions that read or write to LDS executed per work item (affected by flow control) |
| `VALUUtilization` | The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence) |
| `VALUBusy` | The percentage of GPU time vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal) |
| `SALUBusy` | The percentage of GPU time scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal) |
| `MemWrites32B` | The total number of effective 32B write transactions to the memory |
| `L2CacheHit` | The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal) |
| `MemUnitStalled` | The percentage of GPU time the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad) |
| `WriteUnitStalled` | The percentage of GPU time the write unit is stalled. Value range: 0% to 100% (bad) |
| `LDSBankConflict` | The percentage of GPU time LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad) |
## Abbreviations
### MI200 Abbreviations
| Abbreviation | Meaning |
| :------------| --------------------------------------------------------------------------------: |
| `ALU` | Arithmetic Logic Unit |
| `Arb` | Arbiter |
| `BF16` | Brain Floating Point 16 bits |
| `CC` | Coherently Cached |
| `CP` | Command Processor |
| `CPC` | Command Processor Compute |
| `CPF` | Command Processor Fetcher |
| `CS` | Compute Shader |
| `CSC` | Compute Shader Controller |
| `CSn` | Compute Shader, the n-th pipe |
| `CU` | Compute Unit |
| `DW` | 32-bit Data Word, DWORD |
| `EA` | Efficiency Arbiter |
| `F16` | Half Precision Floating Point |
| `FLAT` | FLAT instructions allow read/write/atomic access to a generic memory address pointer, which can resolve to any of the following physical memories:<br>• Global Memory<br>• Scratch (“private”)<br>• LDS (“shared”)<br>• Invalid MEM_VIOL TrapStatus |
| `FMA` | Fused Multiply Add |
| `GDS` | Global Data Share |
| `GRBM` | Graphics Register Bus Manager |
| `HBM` | High Bandwidth Memory |
| `Instr` | Instructions |
| `IOP` | Integer Operation |
| `L2` | Level-2 Cache |
| `LDS` | Local Data Share |
| `ME1` | Micro Engine, running packet processing firmware on CPC |
| `MFMA` | Matrix Fused Multiply Add |
| `NC` | Noncoherently Cached |
| `RW` | Coherently Cached with Write |
| `SALU` | Scalar ALU |
| `SGPR` | Scalar GPR |
| `SIMD` | Single Instruction Multiple Data |
| `sL1D` | Scalar Level-1 Data Cache |
| `SMEM` | Scalar Memory |
| `SPI` | Shader Processor Input |
| `SQ` | Sequencer |
| `TA` | Texture Addressing Unit |
| `TC` | Texture Cache |
| `TCA` | Texture Cache Arbiter |
| `TCC` | Texture Cache per Channel, known as L2 Cache |
| `TCIU` | Texture Cache Interface Unit, Command Processor (CP)s interface to memory system |
| `TCP` | Texture Cache per Pipe, known as vector L1 Cache |
| `TCR` | Texture Cache Router |
| `TD` | Texture Data Unit |
| `UC` | Uncached |
| `UTCL1` | Unified Translation Cache Level 1 |
| `UTCL2` | Unified Translation Cache Level 2 |
| `VALU` | Vector ALU |
| `VGPR` | Vector GPR |
| `vL1D` | Vector Level -1 Data Cache |
| `VMEM` | Vector Memory |

View File

@@ -43,6 +43,8 @@ Runtime
export GPU_DEVICE_ORDINAL="0,2"
```
(hip_visible_devices)=
### `HIP_VISIBLE_DEVICES`
Device indices exposed to HIP applications.

View File

@@ -0,0 +1,241 @@
### Using the LLVM Address Sanitizer (ASAN) with the GPU (Beta Release)
The beta release LLVM Address Sanitizer provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM Address Sanitizer process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications like pure CPU applications. However, this simplicity has not been achieved yet.
This document provides documentation on using ROCm Address Sanitizer.
For information about LLVM Address Sanitizer, see [the LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
**Note**: The beta release of LLVM Address Sanitizer for ROCm is currently tested and validated on Ubuntu 20.04.
### Compiling for Address Sanitizer
The address sanitizer process begins by compiling the application of interest with the address sanitizer instrumentation.
Recommendations for doing this are:
+ Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
+ Add the following options to the existing compiler and linker options:
+ `-fsanitize=address` - enables instrumentation
+ `-shared-libsan` - use shared version of runtime
+ `-g` - add debug info for improved reporting
+ Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
It is not an error to compile some files without address sanitizer instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the Address Sanitizer runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
#### About Compilation Time
When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.
There are a few options if the compile time becomes unacceptable:
+ Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the address sanitizer process.
+ Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
+ Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
### Installing ROCm GPU Address Sanitizer Packages
For a complete ROCm GPU Sanitizer installation, including packages, instrumented HSA and HIP runtimes, tools, and math libraries, use the following instruction,
```bash
sudo apt-get install rocm-ml-sdk-asan
```
### Using AMD Supplied Address Sanitizer Instrumented Libraries
ROCm releases have optional packages containing additional address sanitizer instrumented builds of the ROCm libraries usually found in `/opt/rocm-<version>/lib`. The instrumented libraries have identical names as the regular uninstrumented libraries and are located in `/opt/rocm-<version>/lib/asan`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used, but, as descibed above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.
### Running Address Sanitizer Instrumented Applications
#### Preparing to Run an Instrumented Application
Here are a few recommendations to consider before running an address sanitizer instrumented heterogeneous application.
+ Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
+ Ensure XNACK is enabled
+ For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
+ For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
+ Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
+ Ensure that the application depends on the address sanitizer runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
If it does not appear, when executed the application will quickly output an address sanitizer error that looks like:
```bash
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
+ Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
There is an environment variable, `ASAN_OPTIONS` which can be used to adjust the runtime behavior of the ASAN runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASAN runtime. The device runtime only currently supports the default settings for the few relevant options.
There are two `ASAN_OPTION` flags of particular note.
+ `halt_on_error=0/1 default 1`.
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional address sanitizer instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
+ `detect_leaks=0/1 default 1`.
This option directs the address sanitizer runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
### Runtime Overhead
Running an address sanitizer instrumented application incurs
overheads which may result in unacceptably long runtimes
or failure to run at all.
#### Higher Execution Time
Address sanitizer detection works by checking each address at runtime
before the address is actually accessed by a load, store, or atomic
instruction.
This checking involves an additional load to "shadow" memory which
records whether the address is "poisoned" or not, and additional logic
that decides whether to produce an detection report or not.
This extra runtime work can cause the application to slow down by
a factor of three or more, depending on how many memory accesses are
executed.
For heterogeneous applications, the shadow memory must be accessible by all devices
and this can mean that shadow accesses from some devices may be more costly
than non-shadow accesses.
#### Higher Memory Use
The address checking described above relies on the compiler to surround
each program variable with a red zone and on address sanitizer
runtime to surround each runtime memory allocation with a red zone and
fill the shadow corresponding to each red zone with poison.
The added memory for the red zones is additional overhead on top
of the 13% overhead for the shadow memory itself.
Applications which consume most one or more available memory pools when
run normally are likely to encounter allocation failures when run with
instrumentation.
### Runtime Reporting
It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the address sanitizer runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.
An invalid address detection report for the CPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on address <memory address> at pc <pc> bp <bp> sp <sp> <access> of size <N> at <memory address> thread T0
```
and continues with a stack trace for the access, a stack trace for the allocation and deallocation, if relevant, and a dump of the shadow near the <memory address>.
In contrast, an invalid address detection report for the GPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on amdgpu device <device> at pc <pc> <access> of size <n> in workgroup id (<X>,<Y>,<Z>)
```
Above, `<device>` is the integer device ID, and `(<X>, <Y>, <Z>)` is the ID of the workgroup or block where the invalid address was detected.
While the CPU report include a call stack for the thread attempting the invalid access, the GPU is currently to a call stack of size one, i.e. the (symbolized) of the invalid access, e.g.
```bash
#0 <pc> in <fuction signature> at /path/to/file.hip:<line>:<column>
```
This short call stack is followed by a GPU unique section that looks like
```bash
Thread ids and accessed addresses:
<lid0> <maddr 0> : <lid1> <maddr1> : ...
```
where each `<lid j> <maddr j>` indicates the lane ID and the invalid memory address held by lane `j` of the wavefront attempting the invalid access.
Additionally, reports for invalid GPU accesses to memory allocated by GPU code via `malloc` or new starting with, for example,
```bash
==1234==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fa9f5c92dcc
```
or
```bash
==5678==ERROR: AddressSanitizer: heap-use-after-free on amdgpu device 3 at pc 0x7f4c10062d74
```
currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.
### Running with `rocgdb`
`rocgdb` can be used to further investigate address sanitizer detected errors, with some preparation.
Currently, the address sanitizer runtime complains when starting `rocgdb` without preparation.
```bash
$ rocgdb my_app
==1122==ASan` runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
This is solved by setting environment variable `LD_PRELOAD` to the path to the address sanitizer runtime, whose path can be obtained using the command
```bash
amdclang++ -print-file-name=libclang_rt.asan-x86_64.so
```
It is also recommended to set the environment variable `HIP_ENABLE_DEFERRED_LOADING=0` before debugging HIP applications.
After starting `rocgdb` breakpoints can be set on the address sanitizer runtime error reporting entry points of interest. For example, if an address sanitizer error report includes
```bash
WRITE of size 4 in workgroup id (10,0,0)
```
the `rocgdb` command needed to stop the program before the report is printed is
```bash
(gdb) break __asan_report_store4
```
Similarly, the appropriate command for a report including
```bash
READ of size <N> in workgroup ID (1,2,3)
```
is
```bash
(gdb) break __asan_report_load<N>
```
It is possible to set breakpoints on all address sanitizer report functions using these commands:
```bash
$ rocgdb <path to application>
(gdb) start <commmand line arguments>
(gdb) rbreak ^__asan_report
(gdb) c
```
### Using Address Sanitizer with a Short HIP Application
Refer to the following example to use address sanitizer with a short HIP application,
https://github.com/Rmalavally/rocm-examples/blob/Rmalavally-patch-1/LLVM_ASAN/Using-Address-Sanitizer-with-a-Short-HIP-Application.md
### Known Issues with Using GPU Sanitizer
+ Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
+ Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
+ Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
+ It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.

View File

@@ -0,0 +1,71 @@
# Application Deployment Guidelines for Windows
ISVs deploying applications using the HIP SDK depend on the AMD GPU Drivers, HIP
Runtime Library and HIP SDK Libraries. A compatibility matrix table provides
details on AMDs support model. AMD GPU Drivers are distributed with a HIP
Runtime included. Each HIP Runtime is associated with a HIP compiler version.
Applications built with a particular HIP compiler should document its associated
HIP Runtime version and AMD GPU Driver as minimum version requirements for its
end users. Applications do not distribute the HIP Runtime. Instead, end users
will use the HIP Runtime provided by an AMD GPU Driver. AMD provides backward
compatibility for applications dynamically linked to the HIP Runtime based on
our Driver and HIP support policy. ISV applications using the HIP SDK Libraries,
for example hipBLAS, should distribute the HIP SDK Library as part of its
installer package. It is recommended not to require end users to install the
HIP SDK. AMD provides backward compatibility for AMD Driver and HIP Runtime for
the HIP SDK Libraries based on our support policy. AMD support policy for Visual
Studio and other third-party compilers are documented here.
## Usage Scenario
This guide is intended for Independent Software Vendors (ISVs) and other
software developers intending to build applications with the HIP SDK for
Windows. The HIP SDK is intended for developer distribution in contrast to the
AMD GPU driver which is intended for all end users. The guide discusses how to
use and distribute components from the HIP SDK. The HIP SDK is the collection of
the AMD GPU Driver, HIP Runtime and the HIP Libraries. These three parts are
distributed in the HIP SDK installer. The compatibility and versioning relation
between these three parts is documented here. AMDs support policies for the
developer tools allows the ISVs the stability to plan the usage of a tool chain.
## Recommended Library Distribution Model
The HIP SDK is distributed via a Windows installer. This distribution system is
only intended for software developers and testers. AMD recommends that end users
of the program built against HIP SDK components do not have a requirement to
install the HIP SDK. There are two types of ISV applications that use the HIP
SDK as follows.
The first group of ISV applications have a dependency on the HIP Runtime and
select HIP Header Only Libraries (rocPRIM, hipCUB and rocThrust). This group of
ISV applications need to require their end users install an AMD GPU Driver. Each
AMD GPU driver has a HIP runtime library bundled with it. The ISV application
should ensure that the HIP runtime library has a minimum version associated with
it. As the HIP runtime library does not have semantic versioning, the ISV
application cannot check for compatibility. However, AMD is committed to not
breaking API/ABI compatibility unless the major version number of the HIP
runtime is incremented. ISV applications may run without user warning if the HIP
major version available in the driver is the same as the HIP major version
associated with the compiler it was built with. The ISV at its discretion may
throw a warning if the HIP major version is higher than the associate HIP major
version of the compiler it was built with.
The second group of ISV application has a dependency on the HIP Runtime and one
or more Dynamically Linked HIP Libraries including the HIP RT library. ISV
applications with this dependency need to ensure the end user installs an AMD
GPU Driver and is recommended to distribute the dynamically linked HIP library
in the installer package of its application. This allows end users to avoid
installing the HIP SDK. One benefit of this model is smaller disk space required
as only required binaries are distributed by the ISV application. It also avoids
the end user to have to agree to licensing agreements for the entire HIP SDK.
The version checks recommended for the ISV application including dynamically
linked HIP Libraries follow the same requirements as the ISV applications that
only have the HIP Runtime and header only library. In addition, each dynamically
linked HIP library also has a minimum HIP runtime requirement. Checks for the
minimum HIP version for each dynamically linked HIP library may be added at the
ISVs discretion. Usually, the minimum HIP version check for the HIP runtime is
sufficient if dynamically linked HIP libraries come from the same SDK package as
the HIP compiler.
Please note AMD does not support static linking to any components distributed in
the HIP SDK.

View File

@@ -1,21 +1,47 @@
# Autotag
## How to use
## Pre-requisites
The tag script can simply be invoked by passing it as a python script:
- Create a GitHub Personal Access Token.
- Tested with all the read-only permissions, but public_repo, read:project read:user, and repo:status should be enough.
- Copy the token somewhere safe.
- Configure SSO for this token by authorizing it for the following organizations:
- ROCm-Developer-Tools
- RadeonOpenCompute
- ROCmSoftwarePlatform
## Updating the changelog
- Add or update the release specific notes in `tools/autotag/templates/rocm_changes`
- Ensure the all the repositories have their release specific branch with the updated changelogs.
- Run this for 5.6.0 (change for whatever version you require)
- `GITHUB_ACCESS_TOKEN=my_token_here`
To generate the changelog from 5.0.0 up to and including 5.7.1:
```sh
python3 tag_script.py --help
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --do-previous --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.7 5.7.1
```
To generate the changelog from 5.0.0 up to and including 5.4.3:
To generate the changelog only for 5.7.1:
```sh
python3 tag_script.py -t <GITHUB_TOKEN> --no-release --no-pulls --do-previous --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.4 5.4.3
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.7 5.7.1
```
> **Note**
>
### Notes
> If branch cannot be found, edit default.xml at root.
> Sometimes the script doesn't know whether to include or exclude an entry for a specific release. Continue this part by accepting (Y) or rejecting (N) entries.
> The end result should be a newly generated changelog in the project root.
> Compiling the changelog without the `--do-previous`-flag will always think that all libraries are new since no previous version of said library has been parsed.
> Trying to run without a token is possible but GitHub enforces stricter rate limits and is therefore not advised.
Trying to run without a token is possible but GitHub enforces stricter rate limits and is therefore not advised.
- Copy over the first part of the changelog and replace the old release notes in RELEASE.md.
## Adding new libraries/repositories
- Add the name or group of the repository (retrieved in default.xml in the ROCm project root) to: included_names or included_groups to auto_tag.py.
- At the moment of writing, this is only in the 5.6 branch and not the develop branch.
- Re-run the command specified in the steps above.
- Some libraries do not have the changelog for every point release. The tool will give out warnings, but it is okay to ignore them.

View File

@@ -0,0 +1,15 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
#### HIP 5.6.1 (for ROCm 5.6.1)
##### Fixed Defects
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
- Using *hipGraphAddMemFreeNode* no longer results in a crash

View File

@@ -0,0 +1,152 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### Release Highlights for ROCm 5.7
ROCm 5.7.0 includes many new features. These include: a new library (hipTensor), and optimizations for rocRAND and MIVisionX. Address sanitizer for host and device code (GPU) is now available as a beta. Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major release in the ROCm 5 series. This release is Linux-only.
Important: The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series. Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
#### AMD Instinct™ MI50 End of Support Notice
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) will enter maintenance mode starting Q3 2023.
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 will be the final release for gfx906 GPUs to be in a fully supported state.
- ROCm 6.0 release will show MI50s as "under maintenance" mode for [Linux](./about/release/linux_support) and [Windows](./about/release/windows_support)
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond this major release (ROCm 5.7).
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (EOM (End of Maintenance) will be aligned with the closest ROCm release).
- Bug fixes during the maintenance will be made to the next ROCm point release.
- Bug fixes will not be backported to older ROCm releases for gfx906.
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
#### Feature Updates
##### Non-hostcall HIP Printf
**Current behavior**
The current version of HIP printf relies on hostcalls, which, in turn, rely on PCIe atomics. However, PCle atomics are unavailable in some environments, and, as a result, HIP-printf does not work in those environments. Users may see the following error from runtime (with AMD_LOG_LEVEL 1 and above),
```
Pcie atomics not enabled, hostcall not supported
```
**Workaround**
The ROCm 5.7 release introduces an alternative to the current hostcall-based implementation that leverages an older OpenCL-based printf scheme, which does not rely on hostcalls/PCIe atomics.
Note: This option is less robust than hostcall-based implementation and is intended to be a workaround when hostcalls do not work.
The printf variant is now controlled via a new compiler option -mprintf-kind=<value>. This is supported only for HIP programs and takes the following values,
- “hostcall” This currently available implementation relies on hostcalls, which require the system to support PCIe atomics. It is the default scheme.
- “buffered” This implementation leverages the older printf scheme used by OpenCL; it relies on a memory buffer where printf arguments are stored during the kernel execution, and then the runtime handles the actual printing once the kernel finishes execution.
**NOTE**: With the new workaround,
- The printf buffer is fixed size and non-circular. After the buffer is filled, calls to printf will not result in additional output.
- The printf call returns either 0 (on success) or -1 (on failure, due to full buffer), unlike the hostcall scheme that returns the number of characters printed.
##### Beta Release of LLVM Address Sanitizer (ASAN) with the GPU
The ROCm 5.7 release introduces the beta release of LLVM Address Sanitizer (ASAN) with the GPU. The LLVM Address Sanitizer provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM Address Sanitizer process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications like pure CPU applications. However, this simplicity has not been achieved yet.
Refer to the documentation on LLVM Address Sanitizer with the GPU at [LLVM Address Sanitizer User Guide](./docs/understand/using_gpu_sanitizer.md).
**Note**: The beta release of LLVM Address Sanitizer for ROCm is currently tested and validated on Ubuntu 20.04.
#### Fixed Defects
The following defects are fixed in ROCm v5.7,
- Test hangs observed in HMM RCCL
- NoGpuTst test of Catch2 fails with Docker
- Failures observed with non-HMM HIP directed catch2 tests with XNACK+
- Multiple test failures and test hangs observed in hip-directed catch2 tests with xnack+
#### HIP 5.7.0
##### Optimizations
##### Added
- Added `meta_group_size`/`rank` for getting the number of tiles and rank of a tile in the partition
- Added new APIs supporting Windows only, under development on Linux
- `hipMallocMipmappedArray` for allocating a mipmapped array on the device
- `hipFreeMipmappedArray` for freeing a mipmapped array on the device
- `hipGetMipmappedArrayLevel` for getting a mipmap level of a HIP mipmapped array
- `hipMipmappedArrayCreate` for creating a mipmapped array
- `hipMipmappedArrayDestroy` for destroy a mipmapped array
- `hipMipmappedArrayGetLevel` for getting a mipmapped array on a mipmapped level
##### Changed
##### Fixed
##### Known Issues
- HIP memory type enum values currently don't support equivalent value to `cudaMemoryTypeUnregistered`, due to HIP functionality backward compatibility.
- HIP API `hipPointerGetAttributes` could return invalid value in case the input memory pointer was not allocated through any HIP API on device or host.
##### Upcoming changes for HIP in ROCm 6.0 release
- Removal of gcnarch from hipDeviceProp_t structure
- Addition of new fields in hipDeviceProp_t structure
- maxTexture1D
- maxTexture2D
- maxTexture1DLayered
- maxTexture2DLayered
- sharedMemPerMultiprocessor
- deviceOverlap
- asyncEngineCount
- surfaceAlignment
- unifiedAddressing
- computePreemptionSupported
- hostRegisterSupported
- uuid
- Removal of deprecated code -hip-hcc codes from hip code tree
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
- HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside hipMemcpy3D()
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
- Correct hipGetLastError to return the last error instead of last API call's return code
- Update hipExternalSemaphoreHandleDesc to add "unsigned int reserved[16]"
- Correct handling of flag values in hipIpcOpenMemHandle for hipIpcMemLazyEnablePeerAccess
- Remove hiparray* and make it opaque with hipArray_t

View File

@@ -0,0 +1,36 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
#### Installing all GPU Address sanitizer packages with a single command
ROCm 5.7.1 simplifies the installation steps for the optional Address Sanitizer (ASan) packages. This release provides the meta package *rocm-ml-sdk-asan* for ease of ASan installation. The following command can be used to install all ASan packages rather than installing each package separately,
sudo apt-get install rocm-ml-sdk-asan
For more detailed information about using the GPU AddressSanitizer, refer to the [user guide](https://rocm.docs.amd.com/en/docs-5.7.1/understand/using_gpu_sanitizer.html)
### ROCm Libraries
#### rocBLAS
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
#### HIP 5.7.1 (for ROCm 5.7.1)
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
### Fixed defects
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.