Compare commits

...

787 Commits

Author SHA1 Message Date
Sam Wu
43cd74913b Merge branch 'develop' into roc-6.0.x 2024-01-31 16:04:42 -07:00
Sam Wu
83766203ff Update changelog announcement (#2857)
* Update changelog announcement

* Update phrasing
2024-01-31 16:04:14 -07:00
Sam Wu
e467b13c68 Merge branch 'develop' into roc-6.0.x 2024-01-31 15:04:00 -07:00
Sam Wu
336f88c7c2 Fix typo in changelog (#2856) 2024-01-31 15:03:31 -07:00
Sam Wu
b18eacbdac Merge branch 'develop' into roc-6.0.x 2024-01-31 14:34:08 -07:00
zhang2amd
78bd182403 Update default.xml to version 6.0.2 (#2855) 2024-01-31 14:33:45 -07:00
Lisa
ba9cc4f185 changelog updates (#2792)
* changelog updates

* updates

* changelog updates

* Update CHANGELOG.md

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

* Update RELEASE.md

* 6.0.1 -> 6.0.2

* 6.0.1 -> 6.0.2

* Update CONTRIBUTING.md (#2791)

* Update CONTRIBUTING.md

* Fixed link to licensing document

Also, changed to use relative links for internal files.

* Create issue_retrieval.yml

I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file.

* changed group to ROCM-Ubuntu

* Added a field to specify project number

This action receives an org name and project number and adds issues to it using this information

* Update issue_retrieval.yml

* Update issue_retrieval.yml

* Revert "Update CONTRIBUTING.md" (#2795)

* Text change to direct PRs into default branch, since not all repos have develop branch

* add keywords (#2799)

* Update issue_retrieval.yml

* ci(default.xml): Add hipBLASLt to manifest (#2796)

* Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803)

* Delete .github/ISSUE_TEMPLATE/issue_report.yml

* Delete .github/ISSUE_TEMPLATE/config.yml

* Delete .github/ISSUE_TEMPLATE directory (#2805)

* docs(conf.py): Update article info for release page (#2806)

* docs(conf.py): Update article info for release page

* Update conf.py

* Fix typo (#2809)

* Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2807)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* corrections for Issue #2753 (#2819)

* docs(versions.md): Add 5.6.1 to versions list (#2816)

* Add codeowners for documentation (#2834)

Co-authored-by: samjwu <samjwu@users.noreply.github.com>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2835)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump gitpython from 3.1.30 to 3.1.41 in /docs/sphinx (#2836)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.30 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.30...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* changelog updates

* sync release file with changelog

* remove 6.0.0 duplicates

* update intro

* Update CHANGELOG.md

* Update RELEASE.md

* clean up duplicates

* caps

* minor update

* language update

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2024-01-31 13:26:27 -07:00
Lisa
df70d90d49 radeon updates (#2818)
* radeon updates

* update link

* update intro

* verbiage

* Update docs/index.md

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Update docs/what-is-rocm.md

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Use intersphinx link for radeon

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-01-30 13:20:28 -07:00
dependabot[bot]
95fa47e31a Bump rocm-docs-core from 0.31.0 to 0.33.0 in /docs/sphinx (#2851)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-29 17:20:35 -07:00
Spencer Hance
5afa1539ed Fix link to building.md in README (#2843)
Fix broken link to building.md in README.  It was missing `/docs/` in the path.
2024-01-29 17:04:10 -07:00
BrenAMD
0b5cfca1e4 Updated New ROCm meta package section (#2839) 2024-01-25 12:19:34 -07:00
dependabot[bot]
14979045a8 Bump gitpython from 3.1.30 to 3.1.41 in /docs/sphinx (#2836)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.30 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.30...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-23 09:44:58 -07:00
dependabot[bot]
65b5a383ec Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2835)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-23 09:44:43 -07:00
Sam Wu
c679235a90 Add codeowners for documentation (#2834)
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2024-01-23 09:29:14 -07:00
Sam Wu
4833ecfa6a docs(versions.md): Add 5.6.1 to versions list (#2816) 2024-01-22 15:16:58 -07:00
randyh62
c9425c6d19 corrections for Issue #2753 (#2819) 2024-01-18 09:31:45 -07:00
dependabot[bot]
c4383d217a Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2807)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-16 11:53:20 -07:00
Sam Wu
0ef9f2d53c Merge branch 'develop' into roc-6.0.x 2024-01-16 11:46:26 -07:00
Sam Wu
44b5d516e8 Merge branch 'docs/6.0.0' into roc-6.0.x 2024-01-16 10:56:03 -07:00
Sam Wu
ad66256e52 Merge develop into roc-6.0.x (#2810)
* Create issue_retrieval.yml

I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file.

* changed group to ROCM-Ubuntu

* Added a field to specify project number

This action receives an org name and project number and adds issues to it using this information

* Update issue_retrieval.yml

* Update issue_retrieval.yml

* Generate release notes for 6.0.1 from autotag script (#2790)

* Update CONTRIBUTING.md (#2791)

* Update CONTRIBUTING.md

* Fixed link to licensing document

Also, changed to use relative links for internal files.

* Revert "Update CONTRIBUTING.md" (#2795)

* Text change to direct PRs into default branch, since not all repos have develop branch

* add keywords (#2799)

* Update issue_retrieval.yml

* ci(default.xml): Add hipBLASLt to manifest (#2796)

* Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803)

* Delete .github/ISSUE_TEMPLATE/issue_report.yml

* Delete .github/ISSUE_TEMPLATE/config.yml

* Delete .github/ISSUE_TEMPLATE directory (#2805)

* docs(conf.py): Update article info for release page (#2806)

* docs(conf.py): Update article info for release page

* Update conf.py

* Fix typo (#2809)

---------

Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: yhuiYH <145490163+yhuiYH@users.noreply.github.com>
2024-01-16 10:53:28 -07:00
Sam Wu
d509656c6b Fix typo (#2809) 2024-01-16 10:48:21 -07:00
Sam Wu
c2a3626026 docs(conf.py): Update article info for release page (#2806)
* docs(conf.py): Update article info for release page

* Update conf.py
2024-01-12 17:12:56 -07:00
abhimeda
51d5bf015c Delete .github/ISSUE_TEMPLATE directory (#2805) 2024-01-12 16:12:09 -07:00
abhimeda
c6facfb30f Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803)
* Delete .github/ISSUE_TEMPLATE/issue_report.yml

* Delete .github/ISSUE_TEMPLATE/config.yml
2024-01-12 15:20:15 -07:00
Sam Wu
fce96340f4 ci(default.xml): Add hipBLASLt to manifest (#2796) 2024-01-12 15:19:22 -07:00
abhimeda
8d44e04483 Merge pull request #2800 from ROCm/abhimeda-added-env-variables-to-workflow-file
Added repository secrets to ROCm and pointing the workflow file to use them
2024-01-12 11:46:26 -05:00
abhimeda
dcce85a84a Update issue_retrieval.yml 2024-01-12 10:57:29 -05:00
Lisa
d399b13c88 add keywords (#2799) 2024-01-11 14:07:30 -07:00
yhuiYH
20005e0ef7 Merge pull request #2798 from ROCm/amd/dev/yhui/UpdateTextInContributing
Update Contributing.md to direct PRs to use repo's default branch
2024-01-11 15:08:37 -05:00
Young Hui
d05c1d529e Text change to direct PRs into default branch, since not all repos have develop branch 2024-01-11 14:02:17 -05:00
Lisa
163262643f Revert "Update CONTRIBUTING.md" (#2795) 2024-01-10 11:26:47 -07:00
abhimeda
318126b155 Merge pull request #2772 from ROCm/abhimeda-adding-workflow-file-to-test-github-runner
Abhimeda adding workflow file to create GitHub Action
2024-01-10 10:16:11 -05:00
zhang2amd
221aa04931 Add hipBLASLt in manifest. (#2776) 2024-01-10 07:06:11 -07:00
David Galiffi
2be774fb19 Update CONTRIBUTING.md (#2791)
* Update CONTRIBUTING.md

* Fixed link to licensing document

Also, changed to use relative links for internal files.
2024-01-10 07:04:38 -07:00
Sam Wu
3faa2600eb Generate release notes for 6.0.1 from autotag script (#2790) 2024-01-09 13:39:19 -07:00
Sam Wu
d531936276 Merge roc-6.0.x into docs/6.0.0 (#2784)
* Mi300 info update (#2780)

* docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices (#2775)

* docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices

* docs: Directly link for RST instead of using intersphinx

---------

Co-authored-by: Istvan Kiss <neon60@gmail.com>
2024-01-09 09:21:24 -07:00
Sam Wu
753d2f9719 Merge branch 'develop' into roc-6.0.x 2024-01-08 16:35:26 -07:00
Sam Wu
7ffc622039 docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices (#2775)
* docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices

* docs: Directly link for RST instead of using intersphinx
2024-01-08 16:34:45 -07:00
Istvan Kiss
054689be6a Mi300 info update (#2780) 2024-01-08 16:30:41 -07:00
abhimeda
40b5f85af9 Update issue_retrieval.yml 2024-01-04 15:40:05 -05:00
abhimeda
a1372d56f9 Update issue_retrieval.yml 2024-01-03 14:54:10 -05:00
abhimeda
717b09f7eb Added a field to specify project number
This action receives an org name and project number and adds issues to it using this information
2024-01-03 14:50:52 -05:00
abhimeda
1cd2b651c4 changed group to ROCM-Ubuntu 2024-01-01 21:55:28 -05:00
abhimeda
587f821194 Create issue_retrieval.yml
I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file.
2024-01-01 21:53:42 -05:00
Sam Wu
147dce6f28 Merge branch 'develop' into roc-6.0.x 2023-12-20 15:54:20 -07:00
Sam Wu
4808c615e6 Merge branch 'develop' into docs/6.0.0 2023-12-20 15:53:12 -07:00
Lisa
f94a8620eb Update CHANGELOG.md (#2762) 2023-12-20 13:40:35 -07:00
Lisa
5f9842db8f link fixes & consistency (#2761) 2023-12-20 12:42:15 -07:00
dependabot[bot]
6fae95aa02 Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2759)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 17:13:46 -07:00
Sam Wu
b865ae7796 Merge branch 'roc-6.0.x' into docs/6.0.0 2023-12-19 15:56:57 -07:00
Sam Wu
74a5c1b580 Merge branch 'develop' into roc-6.0.x 2023-12-19 15:56:02 -07:00
Sam Wu
538a44f4d7 docs: Update GPU and OS support for Linux page (#2757) 2023-12-19 15:53:52 -07:00
Sam Wu
6c90336e67 Merge docs/6.0.0 into develop (#2756)
* Marking TransferBench as beta (#2727)

* Known issues (#2731) (#2732)

* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

* docs(library-index.md): Add MIVisionX to library index (#2736)

* Delete docs/about/compatibility/linux-support.md (#2734)

* Delete docs/about/compatibility/linux-support.md

* Update _toc.yml.in

* Update _toc.yml.in

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Corrected OS version (#2738)

* Corrected OS version 

There is no 22.04.5 exist.
It's 22.04.3 which has been tested and supported

* Update CHANGELOG.md

* Update _toc.yml.in (#2750)

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: pramenku <7664080+pramenku@users.noreply.github.com>
2023-12-19 15:43:04 -07:00
Sam Wu
859f3763c8 Merge branch 'develop' into docs/6.0.0 2023-12-19 15:41:06 -07:00
abhimeda
7f4922d2b2 Abhimeda updating issue template (#2749)
* added ROCm v6, MI300, and set a default component

* Delete .github/ISSUE_TEMPLATE/0_issue_report.yml
2023-12-18 15:06:35 -07:00
Lisa
c8c4b5a034 Update _toc.yml.in (#2750) 2023-12-18 12:27:06 -07:00
Mátyás Aradi
3e1a87a4f1 Remove virtualenv build from dependencies (#2699)
* Remove virtualenv build from dependencies

* Rename ROCM_BUILD_DOCS to BUILD_DOCS
2023-12-18 07:03:55 -07:00
pramenku
3522084990 Corrected OS version (#2738)
* Corrected OS version 

There is no 22.04.5 exist.
It's 22.04.3 which has been tested and supported

* Update CHANGELOG.md
2023-12-18 07:03:24 -07:00
yhuiYH
eeb96ebb18 Move documentation contributing.md and add Governance.md and Contributing.md (#2690)
* moved contributing.md to new location as it describes contributing to documentation

* Adding Governance.md and high-level Contributing.md

* fix linting errors (asterisk, whitespace and unused links)

* More linting fixes

* merge conflicts

* verbiage

* License link moved out of codeblock, and text fix there. Changed to full name of AMD. Update links to ROCm Org path

* whitespace linting fix

* Reverted back to ROCm is lead and managed by AMD.  Flows better to me.

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2023-12-15 16:14:13 -07:00
Saad Rahim (AMD)
1c420b4b5c Delete docs/about/compatibility/linux-support.md (#2734)
* Delete docs/about/compatibility/linux-support.md

* Update _toc.yml.in

* Update _toc.yml.in

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-12-15 16:09:50 -07:00
Sam Wu
914befefcb docs(library-index.md): Add MIVisionX to library index (#2736) 2023-12-15 15:59:36 -07:00
Sam Wu
6099778813 Merge branch 'develop' into roc-6.0.x 2023-12-15 15:50:14 -07:00
Sam Wu
8a8504246a docs(library-index.md): Add MIVisionX to library index (#2735)
* Add files via upload (#2701)

* Merge Roc 6.0.x into develop (#2733)

* Marking TransferBench as beta (#2727)

* Known issues (#2731)

* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

* docs(library-index.md): Add MIVisionX to library index

---------

Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
2023-12-15 15:47:15 -07:00
Sam Wu
82d871c907 Merge Roc 6.0.x into develop (#2733)
* Marking TransferBench as beta (#2727)

* Known issues (#2731)

* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
2023-12-15 15:06:03 -07:00
Sam Wu
a9099dd36e Known issues (#2731) (#2732)
* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
2023-12-15 15:05:35 -07:00
Lisa
6ba05d8ab0 Known issues (#2731)
* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
2023-12-15 15:01:52 -07:00
Saad Rahim (AMD)
ba69933774 Marking TransferBench as beta (#2727) 2023-12-15 14:48:33 -07:00
abhimeda
5676b16fce Add files via upload (#2701) 2023-12-15 14:42:13 -07:00
Lisa
1828271505 Update library-index.md (#2723)
* Update library-index.md

* Update library-index.md
2023-12-15 14:33:22 -07:00
Sam Wu
5b672af67d build: Update rocm-docs-core to v0.30.2 (#2724)
* build: Update rocm-docs-core to v0.30.2

* docs: Fix doc links in index
2023-12-15 14:32:46 -07:00
Lisa
a121e35aa7 rearranging (#2718) 2023-12-15 14:03:14 -07:00
zhang2amd
2a71de6c93 Update default.xml for ROCm 6.0.0 (#2721) 2023-12-15 13:20:39 -07:00
Saad Rahim (AMD)
8588444a0d Updating release notes (#2712)
* Updating release notes

* Apply suggestions from code review

* Update RELEASE.md

Co-authored-by: Sam Wu <sjwu@ualberta.ca>

* Update RELEASE.md

Co-authored-by: Sam Wu <sjwu@ualberta.ca>

* Update into text

* Update RELEASE.md

* Update RELEASE.md

Co-authored-by: Sam Wu <sjwu@ualberta.ca>

---------

Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: Sam Wu <sjwu@ualberta.ca>
2023-12-14 14:38:42 -07:00
Sam Wu
b8412e17f3 docs(versions.md): Add back docs versions page (#2716)
This is used by the Version List header for the rocm-docs-home theme flavor
2023-12-14 14:21:11 -07:00
Sam Wu
652f72dbdd docs: Manually add ROCgdb release notes (#2714) 2023-12-14 14:20:57 -07:00
Sam Wu
13da03473f Manual update to Release Notes (#2711)
* docs: Manually add rocprofiler release notes

* docs: Manually add HIP release notes

* Update CHANGELOG.md

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

* docs: HIP 6.0.0

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-12-14 11:42:54 -07:00
Lisa
bcc8603454 update links, remove windows (#2706) 2023-12-14 09:21:50 -07:00
Lisa
5a53b95c7f release updates (#2707)
* release updates

* minor updates

* Update CHANGELOG.md
2023-12-14 09:20:53 -07:00
srawat
7889220f04 Mi200 counters (#2622) 2023-12-12 11:25:57 -07:00
Lisa
19eae6a8eb heading consistency (#2697)
* heading consistency

* update rocrand
2023-12-12 11:16:49 -07:00
srawat
131aa66591 Merge pull request #2700 from SwRaw/rocprofiler_index
Update library-index.md
2023-12-11 11:00:49 +05:30
Sam Wu
c648ca767b fix(tag_script.py): Update organization names for projects used in tagging script (#2698)
Most projects were moved to the ROCm organization
2023-12-08 10:44:26 -07:00
srawat
4922020441 Update library-index.md 2023-12-08 22:18:41 +05:30
srawat
07a778498c Update library-index.md 2023-12-08 22:11:54 +05:30
srawat
d75a05645f Update library-index.md 2023-12-08 17:37:53 +05:30
Sam Wu
00f7899b03 docs(conf.py): Use rocm-docs-core as extension (#2695)
* docs(conf.py): Use rocm-docs-core as extension

instead of calling and instantiating as object (legacy method)

Also apply the rocm-docs-home theme flavor

* build: Update rocm-docs-core to 0.30.1
2023-12-07 09:39:45 -07:00
Sam Wu
412366ff61 Update Changelog and latest Release notes (#2648)
* docs: Remove extra newline from 5.7.1.md template

* docs: Update the changelog and latest release notes

* docs: Rebuild changelog with updated 6.0.0 edits
2023-12-06 16:27:04 -07:00
dependabot[bot]
be1fed8ca4 Bump rocm-docs-core from 0.29.0 to 0.30.0 in /docs/sphinx (#2684)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-05 15:07:34 -07:00
Lisa
16a1d355c1 typo (#2687) 2023-12-04 10:03:02 -07:00
Lisa
3aa7072fc2 metadata test (#2656) 2023-11-30 14:37:12 -07:00
Saad Rahim (AMD)
7179884433 Left Navigation further compression for usability (#2677)
* Left Navigation further compression for usability

* Whitespace

* provide feedback
2023-11-30 13:11:17 -07:00
Lisa
3523e9e822 Open MPI updates (#2655) 2023-11-30 09:58:12 -07:00
Nagy-Egri Máté Ferenc
3b9cd77b93 Clarify mixing C++ and HIP sources via CMake (#2618)
* Carify mixing C++ and HIP sources via CMake

* Designate code blocks

* Simplify lang around host-only use of the HIP API

* Remove superfluous wording.

* Note LINKER_LANGUAGE of mixed sources

* Space after code-block

* Single space in code-block
2023-11-29 07:03:44 -07:00
Mátyás Aradi
ef1c21ccf7 Add CMake support (#2641)
* Add CMake support

* Update README and CHANGELOG

* Update CHANGELOG

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-11-28 09:40:25 -07:00
Istvan Kiss
35893c4df6 Remove disable spellchecks of cmake-packages.rst (#2678) 2023-11-28 07:03:13 -07:00
Saad Rahim (AMD)
c1ee7d32e0 Removing Linux installation related content (#2673)
* Removing Linux installation related content

* TOC updates

* Removing added files

* Line spacing on code block
2023-11-27 14:03:52 -07:00
Istvan Kiss
f8446befd2 Remove disable spellchecks of cmake-packages.rst (#2676) 2023-11-27 11:17:13 -07:00
dependabot[bot]
f51e1144df Bump rocm-docs-core from 0.28.0 to 0.29.0 in /docs/sphinx (#2674)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.28.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-27 10:21:27 -07:00
Lisa
4adaff02a6 Left nav updates (#2647)
* update gpu-enabled-mpi

update the documentation to also include libfabric based network interconnects,
not just UCX.

* add some technical terms to wordlist

* shorten left nav

* grid updates

---------

Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-11-24 07:15:10 -07:00
dependabot[bot]
0d6fc80070 Bump rocm-docs-core from 0.27.0 to 0.28.0 in /docs/sphinx (#2651)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.27.0 to 0.28.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.27.0...v0.28.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-11-22 15:07:01 -07:00
Lisa
33f110e354 update ROCm name (#2660)
* update ROCm name

* update version history page
2023-11-22 10:30:10 -07:00
Saad Rahim (AMD)
9a9cf073b4 spelling check fix (#2649) 2023-11-20 10:12:39 -07:00
Lisa
1e6951dc55 add tensorflow support link (#2612)
* add tensorflow support link

* Update docs/install/tensorflow-install.md

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-11-15 15:41:36 -07:00
Jithun Nair
135e489e7a Update torchvision version to 0.15.2 for PyTorch2.0.1 (#2635)
Ubuntu20.04 entry contains the correct info. This corrects the info for Ubuntu22.04 entry

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-11-15 15:37:57 -07:00
Lisa
c326a64381 Acronym update (#2637) 2023-11-14 08:54:13 -07:00
Lisa
37c48060f7 update release note files (#2617)
---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-11-10 15:14:59 -07:00
dependabot[bot]
3f855e386c Bump rocm-docs-core from 0.26.0 to 0.27.0 in /docs/sphinx (#2626)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-03 07:08:50 -06:00
Sam Wu
aa5eff25fb docs: Update copyright and release history doc (#2624) 2023-11-02 10:10:34 -06:00
Istvan Kiss
ccdcfbd7e3 Fix warnings (#2623)
* Fix warnings

* Fix file conflict

* Remove duplication in 5.7.1 changelog
2023-11-02 10:00:01 -06:00
Saad Rahim (AMD)
c3eaa65705 Merge pull request #2609 from LisaDelaney/roc-5.7.x-into-develop
Merge 5.7 changes into Develop
2023-10-26 10:01:17 -06:00
Lisa Delaney
9d8a830851 linting fixes 2023-10-25 15:54:00 -06:00
Lisa Delaney
23d563eefb remove auto-generated files 2023-10-25 13:56:04 -06:00
Lisa Delaney
7585e9b165 merge conflict 2023-10-25 13:52:44 -06:00
Lisa Delaney
f0f4fa15b4 merge conflicts & remove linux install 2023-10-25 13:15:47 -06:00
Sam Wu
549b23b521 Add Roopa's changes to gpu sanitizer doc (#2607)
* Add Roopa's changes to gpu sanitizer doc

* Markdown linting fixes
2023-10-25 13:02:28 -06:00
Sam Wu
b0caf52156 Updates for consistency (#2604)
* Update RELEASE.md and 5.7.0.md to match CHANGELOG.md

* Update 5.2.0.md to match CHANGELOG.md

* Copy CHANGELOG into about folder to match RELEASE

To avoid having divergence in relative links between RELEASE and CHANGELOG
2023-10-24 12:57:39 -06:00
Lisa
201f626887 Structure cleanup (#2585)
* link fixes

* remove changelog

* remove auto-generated file
2023-10-24 10:11:41 -06:00
danpetreamd
37db70c914 fixed typo: correct path to direct rendering interface (DRI) devices is /dev/dri/renderD*. (#2593) 2023-10-24 10:11:00 -06:00
Jithun Nair
244c6a6823 Fix openmp documentation (#2598) 2023-10-23 13:03:54 -06:00
dsclear-amd
ce82a047bf Issue reporting templates roc 5.7.x (#2586)
* Adds GitHub issue templates for reporting problems, and feature requests.

* Adds issue reporting templates for logging bugs, and requesting features.

* Removed duplicate ISSUE_TEMPLATE directory.
2023-10-20 11:38:16 -06:00
Sam Wu
17a1cb8bbb docs: Remove duplicate CHANGELOG (#2591) 2023-10-20 11:07:39 -06:00
Sam Wu
afa14c518e Regenerate release notes with AMDMIGraphX (#2537)
* Regenerate changelog with AMDMIGraphX

* Add rccl 2.17.1-1 release notes

* Update 5.7.0 release notes to include lib changes
2023-10-18 08:58:02 -06:00
Sam Wu
b61a54e4f3 Update LLVM ASan documentation (#2529) 2023-10-17 16:51:51 -06:00
Saad Rahim (AMD)
227e135f5a Making GPU and OS support page titles consistent between Win and Linux (#2575) 2023-10-17 16:51:14 -06:00
Houssem MENHOUR
1e9a1ca55a Update GPU Support on Linux (#2572)
Update docs with information in the AMD blog post announcing support for some RDNA3 Radeon GPUs on Linux.

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-10-17 16:13:05 -06:00
Saad Rahim (AMD)
20f3c28345 Fixing cut and paste for RDNA3 architecture of 7900 (#2574) 2023-10-17 11:34:49 -06:00
Saad Rahim (AMD)
ef93b5e176 Adding 7900 XTX and W7900 to compatibility matrix (#2573) 2023-10-17 11:16:41 -06:00
Istvan Kiss
2dd6923ab9 Fix warnings (#2548)
* Fixed most of the warnings

* Temporary fix of copied files links
2023-10-17 07:05:58 -06:00
Mészáros Gergely
59b53af074 Bump rocm-docs-core version and fix dependabot settings (#2571)
dependabot mis-detected the repository to be a library
(instead of an application) and widened the rocm-docs-core verison
instead of increasing it. This basically disabled pinning.

Explicitly specify to increase the version instead of widening it
to hopefully prevent this in the future.
2023-10-17 07:03:14 -06:00
Lisa
fd927e514d What-is and TOC clean-up (#2539) 2023-10-16 15:25:00 -06:00
Saad Rahim (AMD)
72d4da7da0 Typo in graphical workstation setting (#2569) 2023-10-16 09:56:02 -06:00
Sam Wu
aac49cef23 Regenerate changelog with AMDMIGraphX (#2544) 2023-10-16 09:48:10 -06:00
Saad Rahim (AMD)
69b8117726 Fixing links to Radeon Software for Linux install (#2568) 2023-10-16 09:35:17 -06:00
Sam Wu
9ac4a7b194 Fix typo (#2567) 2023-10-16 09:34:29 -06:00
Saad Rahim (AMD)
00163edd45 radeon software for linux announcement (#2566) 2023-10-16 09:13:28 -06:00
Nara
80fd791421 Add Radeon install instructions for Linux (#2565) 2023-10-16 09:12:17 -06:00
Saad Rahim (AMD)
f65ab4ce27 Adding UB 22.04 container to docker support matrix (#2564) 2023-10-16 07:09:08 -06:00
Sam Wu
365b31728d Update doc reqs for 5.7.1 (#2558)
* Update doc reqs

rocm-docs-core==0.26.0

* Update release notes
2023-10-13 17:12:49 -06:00
Sam Wu
b6c71018a6 Disable epub format in rtd yaml config (#2557)
Because rubric is not supported

ValueError: <container: <rubric...><container...>> is not in list
2023-10-13 16:51:16 -06:00
Sam Wu
54177e8b96 Update rtd conf.py for 5.7.1 (#2556) 2023-10-13 16:41:19 -06:00
Saad Rahim (AMD)
74f4f86c92 5.7.1 Release Notes (#2550)
* 5.7.1 Release Notes

* Run script for 5.7.1 release notes

* Update CHANGELOG header

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-10-13 16:11:48 -06:00
Nara
74d8f95afb ROCm 5.7.1 Linux install and compatibility updates (#2547) 2023-10-13 15:16:14 -06:00
Saad Rahim (AMD)
50ad3847e5 Docker Image Support table updates (#2545) 2023-10-12 14:00:30 -06:00
Lisa
c6e2856822 Update style guidelines (#2542) 2023-10-12 13:50:15 -06:00
Lisa
444efec642 Docker support updates (#2541) 2023-10-11 11:35:10 -06:00
Lisa
4b7775d264 move spack & update pytorch (#2532) 2023-10-10 14:51:55 -06:00
Nara
5700b8f9e8 fix: remove library name check since changelogs will not contain changes for different libraries (#2535) 2023-10-10 07:08:17 -06:00
Lisa
e87dba01c6 ROCm restructuring (#2521)
Flattened out page structure for improved navigability.
 * Change Table of Contents 
 * Update the install guides for windows and linux
 * Removed extraneous index pages
 * GPU architecture pages duplicate entries removed
 * spack page cleanup

---------

Co-authored-by: Sam Wu <samwu103@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-10-06 15:42:11 -06:00
Lisa
7d22b96c5d remove image (#2505) 2023-10-06 15:39:53 -06:00
urtiwari
4496b2abc8 Merge pull request #2526 from urtiwari/develop
Added the table content in toc_yml file
2023-10-06 09:23:34 -07:00
urtiwari
2b788350e4 Updated the latest version in the document 2023-10-06 16:06:56 +00:00
urtiwari
e607ba6259 Merge branch 'develop' into develop 2023-10-06 08:20:10 -07:00
Sam Wu
0e7ae20a32 Docs: Update Spack prerequisite instructions (#2528)
* docs: Update Spack pre requisite instructions

* docs(Spack.md): Update phrasing for Spack prerequisite instructions

---------

Co-authored-by: Sam Wu <root@MKM-L2-SAMWU155.amd.com>
2023-10-06 09:16:29 -06:00
urtiwari
033b6d089e Removed the machine name from the document 2023-10-05 21:39:03 +00:00
urtiwari
4b62e9b90f Fixing table format 2023-10-05 20:41:38 +00:00
urtiwari
cf0798ec0d Merge branch 'develop' of https://github.com/urtiwari/ROCm into develop 2023-10-05 20:38:17 +00:00
urtiwari
75456466e7 Fixing table format 2023-10-05 20:37:00 +00:00
Sam Wu
3176676240 Fix _toc.yml.in
move spack to How To section in Table of Contents

remove duplicate entry in Table of Contents
2023-10-04 16:35:40 -06:00
urtiwari
24614972d3 Updated the table contents related to Spack 2023-10-04 22:22:33 +00:00
urtiwari
1e96665c34 Updated the table contents related to Spack 2023-10-04 22:05:27 +00:00
urtiwari
42a44e020f Merge branch 'RadeonOpenCompute:develop' into develop 2023-10-04 14:18:21 -07:00
urtiwari
99073fb9fc Updated the table contents related to Spack 2023-10-04 21:09:56 +00:00
urtiwari
9f2c53ef0a Adding Spack document (#2516)
* Adding Spack document

* Fixed the markdown errors

* Fixed the markdown errors

* Fixed the markdown errors

* Fixed the markdown errors

* Fixed the markdown errors

* Fixed the spelling errors

* Fixed the spelling errors

---------

Co-authored-by: urtiwari <you@example.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-10-02 14:00:59 -07:00
urtiwari
acd247bfe8 Fixed the spelling errors 2023-10-02 20:45:36 +00:00
Sam Wu
6e70c6026f Merge branch 'develop' into develop 2023-10-02 14:36:07 -06:00
Roopa Malavally
315b8770a4 Release Notes for 5.7.1 (#2520)
* Create 5.7.1.md

Creating release notes for 571

* Update .wordlist.txt

Added words for SPACK
2023-10-02 13:56:00 -06:00
urtiwari
060838bcc2 Fixed the spelling errors 2023-10-02 19:53:49 +00:00
Tasso
8d68b6618b Merge pull request #2514 from RadeonOpenCompute/amd/dev/azambela/path-name-change-branch
Fixed invalid path.
2023-10-02 10:36:54 -04:00
Tasso
b0d773d2a9 Merge branch 'develop' into amd/dev/azambela/path-name-change-branch 2023-10-02 10:35:02 -04:00
Tasso
aff08a5f42 Merge pull request #2518 from RadeonOpenCompute/amd/dev/azambela/rocm-opencl-branch
Removed reference /opt/rocm/opencl/bin/clinfo
2023-10-02 10:34:42 -04:00
Saad Rahim (AMD)
39e0150f94 Merge branch 'develop' into amd/dev/azambela/path-name-change-branch 2023-10-02 08:26:55 -06:00
Saad Rahim (AMD)
d856e6fa3e Merge branch 'develop' into amd/dev/azambela/rocm-opencl-branch 2023-10-02 08:26:18 -06:00
Saad Rahim (AMD)
64496f2838 Merge pull request #2512 from saadrahim/cherry-pick-changelog
Fix Changelog Cherry Pick back to develop (#2501)
2023-09-29 16:37:17 -06:00
urtiwari
60491de85f Fixed the markdown errors 2023-09-29 18:54:49 +00:00
urtiwari
2065ff398f Fixed the markdown errors 2023-09-29 18:45:48 +00:00
urtiwari
64ad833c33 Fixed the markdown errors 2023-09-29 18:09:41 +00:00
urtiwari
d8d55a1717 Fixed the markdown errors 2023-09-29 17:44:16 +00:00
urtiwari
ee6c183aa9 Fixed the markdown errors 2023-09-29 17:32:24 +00:00
Saad Rahim (AMD)
948bb14cce Release notes fix (#2513) 2023-09-29 10:52:32 -06:00
Saad Rahim (AMD)
e29f654883 Fix Changelog (#2501) 2023-09-29 10:52:32 -06:00
Lisa
7b3e6364f9 Email link update (#2517) 2023-09-29 10:27:20 -06:00
Tasso Zambelakis
5c1b2a7a5f Removed reference /opt/rocm/opencl/bin/clinfo
Since we are not installing the ROCm OpenCL packages.  We are not able to
test ROCm withg this command.

Signed-off-by: Tasso Zambelakis <Tasso.Zambelakis@amd.com>
2023-09-29 12:16:55 -04:00
YellowRoseCx
a45c51475e RX 6700* doc fixes in windows_support.md (#2497)
* RX 6700* doc fixes in windows_support.md

Correct RX 6700* LLVM target to gfx1031 windows_support.md

Change name from "RX 6750" to "RX 6750 XT"

* Fix RX7600 LLVM to gfx1102 in windows-support.md

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-09-28 16:34:41 -06:00
urtiwari
0fa1796636 Adding Spack document 2023-09-28 20:55:47 +00:00
Sam Wu
84f2c86126 Remove extra line in package manager integration (#2511) 2023-09-28 10:13:39 -06:00
Saad Rahim (AMD)
35122729b8 Release notes fix (#2513) 2023-09-28 09:24:16 -06:00
Tasso Zambelakis
8252721a31 Fixed invalid path.
The export PATH rocm folder name does not reflect the folder name used in /opt/rocm-5.7.0.

Signed-off-by: Tasso Zambelakis <Tasso.Zambelakis@amd.com>
2023-09-28 11:02:27 -04:00
Sam Wu
c98da4a11a Remove extra line in package_manager_integration.md (#2508) 2023-09-27 16:01:22 -06:00
Saad Rahim (AMD)
14e0fae0fe Fix Changelog (#2501) 2023-09-26 11:05:18 -06:00
dsclear-amd
f6f6bc7b24 Modifies Linux installation step organization to place newer OSes first. (#2498)
This should increase usability and prevent errors, since the most common
	use case is the user using the latest version of their OS,
	rather than the oldest supported one.
2023-09-26 07:00:41 -06:00
Sam Wu
13bea6bf4e disable spellcheck for license 2023-09-21 13:24:01 -06:00
Sam Wu
7a5f2eb508 add alt licensing for footer link 2023-09-21 13:14:52 -06:00
Sam Wu
786b44d8eb Remove 404.md from ROCm (#2487)
* rm 404 img

* remove gitignore file

* remove 404 page on rocm
2023-09-20 11:51:31 -06:00
Sam Wu
fac4843569 Fixes for roc-5.7.x branch (#2486)
* Update Release Note Tables for 5.6.1 and 5.7.0 (#2478)

* add changelog table for 5.6.1

* update 5.7.0 changelog table

* specify svg size

* do not use xelatex

* set fontpkg

* fix typo in conf.py

* fix typo

* Update openmp.md

* rm 404 img
2023-09-20 11:49:47 -06:00
Lisa
940d2933ff Link and formatting fixes (#2482) 2023-09-20 09:55:21 -06:00
Nara
80d8eb84ef Fix incorrect LLVM target for RX 7600 in Windows Support page (#2483) 2023-09-20 07:04:05 -06:00
Sam Wu
acde6284a0 Update Release Note Tables for 5.6.1 and 5.7.0 (#2478)
* add changelog table for 5.6.1

* update 5.7.0 changelog table
2023-09-19 12:05:25 -06:00
Saad Rahim (AMD)
63a45a168e Merge pull request #2477 from RadeonOpenCompute/5.7.0-merge-to-develop
5.7.0 merge to develop
2023-09-18 15:46:56 -06:00
Saad Rahim
fe3c9ebf38 Linting fixes bullets 2023-09-18 15:34:52 -06:00
Saad Rahim
03f78be781 Merge remote-tracking branch 'origin/develop' into 5.7.0-merge-to-develop 2023-09-18 15:29:06 -06:00
Saad Rahim (AMD)
c2a4257103 Feedback 5.7 (#2476)
* update relative link to llvm asan guide

remove docs dir from path

* Minor typo and update on supported OSes

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-18 15:25:52 -06:00
Lisa
d0d4eed1a6 Update titles to sentence case (#2455) 2023-09-18 12:26:31 -06:00
Lisa
772b51a7d2 Add ROCm A-Z entries to TOC (#2454) 2023-09-18 12:13:56 -06:00
Nara
006546e9e6 GPU memory model (#2379) 2023-09-18 07:16:50 -06:00
zhang2amd
fdc2f51b25 Update default.xml for 5.7 (#2471)
Update version to 5.7
Added a few new projects.
2023-09-15 18:12:30 -06:00
Sam Wu
23aa1eec20 Adjust 5.7.0 highlights (#2473)
* adjust 5.7.0 highlights

* adjust important highlights phrasing
2023-09-15 17:31:47 -06:00
Sam Wu
0bcf8c03e1 Small update to wording for release note reference to ASan user guide (#2470) 2023-09-15 17:09:32 -06:00
Sam Wu
a3b2bc3395 add announcement (#2472) 2023-09-15 17:09:12 -06:00
zhang2amd
89dc44ea6c Update default.xml for 5.7 (#2471)
Update version to 5.7
Added a few new projects.
2023-09-15 16:53:41 -06:00
Saad Rahim (AMD)
5c07070e73 5.7 install instructions (#2467)
* Update install instructions to 5.7

* RTG additions to install instructions

* update install instructions for multi version

---------

Co-authored-by: Máté Ferenc Nagy-Egri <mate@streamhpc.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-15 11:56:23 -06:00
Sam Wu
c9630d82da HIP 5.7.0 Release Notes (#2468)
* add links to asan

* add HIP 5.7.0 release notes
2023-09-15 11:56:01 -06:00
Saad Rahim (AMD)
3974c5c1a1 Version bump in nav bar (#2465) 2023-09-15 10:32:47 -06:00
Saad Rahim (AMD)
3348de77d1 5.7 support tables (#2463) 2023-09-15 10:22:15 -06:00
Roopa Malavally
3825dbc2b3 Update Address Sanitizer docs (using-gpu-sanitizer.md) (#2460)
* Update using-gpu-sanitizer.md

Updated content

* fixes for markdown linting

use * instead of + for lists

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-15 10:06:48 -06:00
Sam Wu
1e92ef9a2d update using gpu sanitizer (#2462) 2023-09-15 09:03:41 -07:00
Roopa Malavally
1ae743b22a Create 5.7.0.md (#2452)
* site restructure phase 1 - file reorganization (#2428)

* Update README.md (#2440)

Fix link to CHANGELOG.md

* Create 5.7.0.md

Release notes for ROCm 5.7.0

* Update 5.7.0.md

* Update 5.7.0.md

Added release highlights for ROCm v5.7

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* Update 5.7.0.md

* update markdown formatting 5.7.0.md and add links

* update RELEASE.md for 5.7.0

* add 5.7.0 release notes to CHANGELOG

* resolve rebase conflict

* Revert "site restructure phase 1 - file reorganization (#2428)"

This reverts commit d04797d1c8.

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Vishal Rao <vishalrao@gmail.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-15 09:05:09 -06:00
Nara
e8c2065d7c Added notes for incompatibilities with certain TensorFlow versions. (#2435)
* Added notes for incompatibilities with certain TensorFlow versions.

* Small improvements
2023-09-13 15:55:33 -06:00
Sam Wu
14402ad410 Release notes for 5.7.0 (#2374) 2023-09-13 15:55:00 -06:00
Lisa
7c5976004f ROCm A-Z page & link cleanup (#2450) 2023-09-13 13:00:50 -06:00
Vishal Rao
dba06fe315 Update README.md (#2440)
Fix link to CHANGELOG.md
2023-09-08 10:21:16 -06:00
Lisa
890c735f53 site restructure phase 1 - file reorganization (#2428) 2023-09-08 10:02:17 -06:00
dependabot[bot]
3535c43d4e Bump rocm-docs-core from 0.23.0 to 0.24.0 in /docs/sphinx (#2438)
* Bump rocm-docs-core from 0.23.0 to 0.24.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update requirements.in

* Update requirements.txt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-07 16:27:43 -06:00
Paul R. C. Kent
75eed2ee3e Fix RHEL9 installer links (#2426)
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-09-06 11:23:01 -06:00
Saad Rahim (AMD)
0c3915923f Merge pull request #2434 from RadeonOpenCompute/merge-5.6.1
Merge 5.6.1 to develop
2023-09-06 11:16:52 -06:00
Saad Rahim (AMD)
d3049169de Merge branch 'develop' into merge-5.6.1 2023-09-05 16:19:10 -06:00
Sam Wu
6c0419fb0d Add hipSPARSELt and hipTensor to Projects and licenses (#2431)
* add hipsparselt

* add hiptensor to toc and licenses

* alphabetize licenses

* update rocm-docs-core to 0.23.0
2023-09-05 15:57:10 -06:00
srawat
996064950d OpenMP updates (#2404)
* Added deleted sections to openmp.md and other improvements

* Update CONTRIBUTING.md

* Update _toc.yml.in

* OpenMP updates for 5.7

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update openmp.md

* Update CONTRIBUTING.md

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-09-01 17:28:32 -06:00
dependabot[bot]
77e2424f36 Bump rocm-docs-core from 0.21.0 to 0.22.0 in /docs/sphinx (#2427)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.21.0 to 0.22.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/v0.22.0/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.21.0...v0.22.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-31 17:15:33 -06:00
Sam Wu
62c0afd5ba add hiptensor to list of libs (#2414) 2023-08-31 14:18:57 -06:00
Roopa Malavally
d0953efad0 Update rocmcc.md (#2424)
Fixed https://ontrack-internal.amd.com/browse/SWDEV-407505?src=confmacro
2023-08-31 10:10:11 -06:00
searlmc1
f73d941657 Update using_gpu_sanitizer.md (#2423)
Update AMD supplied libs section
2023-08-31 09:33:12 -06:00
Máté Ferenc Nagy-Egri
ddbe4cd38f Update Linux install instructions for 5.6.1 2023-08-30 07:08:50 -06:00
Sam Wu
7e097ce72a Update conf.py 2023-08-29 17:04:47 -06:00
Saad Rahim
f3d3929f11 Updating version number to 5.6.1 2023-08-29 16:56:11 -06:00
Nara
084ed7f4cb docs: fix missing '--append' flag in install instructions (#2411) 2023-08-29 16:53:28 -06:00
Saad Rahim (AMD)
7482a8b261 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#2419) (#2420)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 16:08:48 -06:00
dependabot[bot]
f414c30836 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#2419)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 15:58:59 -06:00
Saad Rahim (AMD)
bf8f0ccc65 Updating the manifest file (#2417) 2023-08-29 15:07:13 -06:00
Sam Wu
ed8251872f 5.6.1 Release notes (#2416)
* 5.6.1 rel notes

* update rtd config
2023-08-29 15:04:53 -06:00
Sam Wu
8c01bfbb6e Change OpenMP Image Syntax and Update RTD config (#2400)
* update rtd config

* use standard markdown syntax for openmp svg

* fix rtd config
2023-08-25 10:47:32 -06:00
Lisa
b963f7fa05 404 updates (#2406)
add 404 page image

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-24 17:35:44 -06:00
Sam Wu
5b0d7bcebd fix RTD build failing on pdflatex and linting deadlock (#2398)
* docs(openmp.md): specify width and height for openmp toolchain svg

* fix linting
2023-08-23 10:54:28 -06:00
Saad Rahim
eef2937171 Merge pull request #2392 from RadeonOpenCompute/roc-5.6.x
Merging ROCm 5.6.x to develop
2023-08-21 16:27:40 -06:00
Sam Wu
52d59937d1 Update linting.yml 2023-08-21 16:17:59 -06:00
Sam Wu
ee72fbac97 Update linting.yml
remove roc**
to avoid triggering twice
2023-08-21 16:09:59 -06:00
Saad Rahim
5a33e54265 Removing duplicated concurency 2023-08-21 15:47:08 -06:00
Saad Rahim
ef248c087c Merge branch 'develop' into roc-5.6.x 2023-08-21 15:45:29 -06:00
Sam Wu
017d9717e0 build: concurrency for linting to prevent deadlock (#2394) 2023-08-21 15:44:51 -06:00
Saad Rahim
445432da13 Merge branch 'develop' into roc-5.6.x 2023-08-21 15:11:36 -06:00
Lisa
f6c439b56b Updating the What is ROCm page and related content (#2386) 2023-08-18 14:16:17 -06:00
Nara
c3e8e15e51 doc: Update version in install guide to 5.6 (#2387) 2023-08-18 13:57:45 -06:00
Nara
20ae555e61 doc: Update version in install guide to 5.6 (#2387) 2023-08-18 07:26:49 -06:00
Sam Wu
fa16caba4a Add License page (#2371)
* fix typo

* add license page

* move license in toc

* Update license.md

* improve phrasing for license

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-08-17 08:44:51 -06:00
Saad Rahim
7c6dede59d Window updates (#2365)
* Changing SKU to Edition

* Installation phrasing

* Adding the app deployment guide

* Fixing links

* Update docs/understand/windows-app-deployment-guidelines.md

---------

Co-authored-by: Sam Wu <sjwu@ualberta.ca>
2023-08-16 16:32:54 -06:00
Lisa
4813f1f37d language cleanup of ROCm docs (#2380)
* remove 'the'

* fix linking for GitHub Known Issues in nav tree

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2023-08-15 09:32:30 -06:00
Mátyás Aradi
261530f5f7 Fix caption typo for MI100 (#2375) 2023-08-10 08:44:45 -06:00
Roopa Malavally
d11c566fb2 Create using_gpu_sanitizer.md (#2338)
* Create using_gpu_sanitizer.md

* Created GPU Sanitizer File and Title

* add technical terms to wordlist and fix spelling

* spelling
---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: b-sumner <brian.sumner@amd.com>
2023-08-09 14:53:28 -06:00
Sam Wu
14153b9540 fix typos and add links to rocm-docs-core user and developer guides in contributing section (#2372) 2023-08-09 14:02:05 -06:00
dependabot[bot]
43601a0755 Bump certifi from 2022.12.7 to 2023.7.22 in /docs/sphinx (#2369)
Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-08 09:30:57 -06:00
dependabot[bot]
c3b2062c51 Bump pygments from 2.14.0 to 2.15.0 in /docs/sphinx (#2368)
Bumps [pygments](https://github.com/pygments/pygments) from 2.14.0 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.14.0...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-04 17:31:27 -06:00
dependabot[bot]
cced9a7955 Bump cryptography from 41.0.0 to 41.0.3 in /docs/sphinx (#2367)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.0 to 41.0.3.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.0...41.0.3)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-04 17:27:40 -06:00
Sam Wu
df0ee5a0ae add version to html title 2023-08-04 17:18:41 -06:00
srawat
3bfce9c570 corrected typo in contributing.md (#2334)
* Added deleted sections to openmp.md and other improvements

* Update CONTRIBUTING.md

* add example of snake case

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-04 12:46:13 -06:00
Sam Wu
45505e4912 ROCm Version page (#2331)
* add ROCm versions page

* add release dates from github tags

* fix versions list table

* fix dates

* update version page title
2023-08-01 12:09:50 -06:00
Nagy-Egri Máté Ferenc
d9376ebfc7 Use linting from rocm-docs-core (#2207)
* Linting from rocm-docs-core

* Give name to doc linting CI job

* Shorter job name
2023-07-31 10:52:45 -06:00
dependabot[bot]
31fcc9aafb Bump rocm-docs-core from 0.19.0 to 0.20.0 in /docs/sphinx (#2351)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-31 08:45:32 -06:00
Saad Rahim
6fb7b9f3b5 GPU support clarification (#2350) 2023-07-27 17:42:24 -06:00
Saad Rahim
bd553f263b GPU support clarification (#2350) 2023-07-27 17:41:41 -06:00
Saad Rahim
7f8eede7d1 linting fix 2023-07-27 16:30:18 -06:00
Saad Rahim
0741268fd5 Updating GPU support list 2023-07-27 16:30:18 -06:00
Saad Rahim
61dd65f29f Merge pull request #2349 from saadrahim/windows_additional_gpus
Windows additional GPUs
2023-07-27 16:26:30 -06:00
Saad Rahim
343693ed6f linting fix 2023-07-27 16:02:54 -06:00
Saad Rahim
3c27919a9c Updating GPU support list 2023-07-27 15:51:19 -06:00
Saad Rahim
ea1f2498f7 Merge remote-tracking branch 'origin/docs/5.6.0' into windows_additional_gpus 2023-07-27 15:38:43 -06:00
Sam Wu
4ab3787abe Merge pull request #2345 from RadeonOpenCompute/docs/5.5.1
Docs/5.5.1 Sync into 5.6
2023-07-27 13:32:02 -06:00
Sam Wu
b4d3dde1a2 Update management_tools.md 2023-07-27 13:28:31 -06:00
Saad Rahim
b60afeeafe Update ai_tools.md 2023-07-27 13:28:21 -06:00
Saad Rahim
76af020540 Merge branch 'docs/5.6.0' into docs/5.5.1 2023-07-27 13:26:47 -06:00
Saad Rahim
ebd44bb372 Merge pull request #2344 from RadeonOpenCompute/docs/5.6.0
Sync 5.6 branches
2023-07-27 13:20:39 -06:00
Sam Wu
e96f137f44 fix merge conflict 2023-07-27 13:16:43 -06:00
Saad Rahim
4dd5cf1e59 fixing linting (#2343)
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-07-27 13:11:50 -06:00
Sam Wu
fab4379715 Add 5.5.1 release notes (#2342)
* add 5.5.1 release notes

* fix markdown linting violations

* fix release notes

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-07-27 12:43:11 -06:00
Sam Wu
d17a27ca84 set article info for windows pages (#2341) 2023-07-27 12:28:33 -06:00
Saad Rahim
ddb77b9dcf Merge branch 'docs/5.5.1' of github.com:saadrahim/ROCm into docs/5.5.1 2023-07-27 12:20:03 -06:00
Saad Rahim
52f52b7976 CI on docs branch 2023-07-27 12:18:58 -06:00
Saad Rahim
a35248bb77 Delete 5.5-win.md 2023-07-27 12:11:41 -06:00
Saad Rahim
9d05c49458 Delete #5.5-win.md# 2023-07-27 12:11:29 -06:00
Saad Rahim
419f674456 Windows release notes 2023-07-27 12:08:28 -06:00
Saad Rahim
e13e1d31c3 Adding Windows Installation Instructions (#2339) 2023-07-27 11:00:44 -06:00
srawat
253f69b445 Adding openmp image (#2323)
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-07-25 11:05:09 -06:00
Sam Wu
5f546d44b3 Update Toolchain and Contributing Guides (#2315)
* spell out HPC acronym in explanation doc

* update toolchain docs

order in importance descending

* update Contributing guide

add discussions

update formatting and grammar

* separate contributing section for readability

* fix formatting for mdl

* fix spelling
2023-07-25 10:29:45 -06:00
dependabot[bot]
a9ae111741 Bump rocm-docs-core from 0.18.3 to 0.19.0 in /docs/sphinx (#2320)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.3 to 0.19.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.3...v0.19.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-12 09:29:05 -06:00
Sam Wu
eb12f3f851 Changelog updates for 5.6.0 (#2306)
* remove typos in changelog

* add 5.6 release notes

* add amd smi changes for 5.6.0
2023-07-07 09:39:42 -06:00
Sam Wu
524f009280 Links for Reference pages (#2307)
* reorg toc to match all ref material page

* add links to docs, github, and changelogs
2023-07-07 09:37:15 -06:00
Rahul Garg
d23a85c707 Update backward incompatible planned changes in 5.5 (#2279)
* Update backward incompatible planned changes

* add planned changes to changelog

* update rocm-docs-core to v0.18.3

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-07-07 09:36:40 -06:00
Sam Wu
2786b32eea Update Links (#2240)
* update link to PCIe Gen 4 pdf

* fix broken links

* remove references to broken links

* fix spelling of data center
2023-07-07 09:35:55 -06:00
Edgar Gabriel
2721042eac gpu-aware MPI changes (#2311)
- simplify the configure arguments of UCX to only provide
flags absolutely required

- add the UCC compatibility matrix to the docs
2023-07-06 09:17:56 -06:00
Sam Wu
26935408e0 Add configurations for PDF output on Read the Docs (#2305)
* add configurations for pdf output on rtd

* set date for wip release notes

* add copyright to pdf
2023-07-04 21:29:31 -06:00
Sam Wu
2c828465f2 rocm-docs-core v0.18.3 2023-06-30 09:42:51 -06:00
Sam Wu
58b137d43e rocm-docs-core v0.18.3 2023-06-30 09:41:51 -06:00
Sam Wu
372a257eed Changelog updates for 5.6.0 (#2306)
* remove typos in changelog

* add 5.6 release notes

* add amd smi changes for 5.6.0
2023-06-30 09:27:39 -06:00
Sam Wu
12bc633320 Links for Reference pages (#2307)
* reorg toc to match all ref material page

* add links to docs, github, and changelogs
2023-06-29 16:55:48 -06:00
Saad Rahim
a144653405 Fixing typos 2023-06-29 13:41:44 -06:00
Saad Rahim
85a4eca655 Fixing links for management tools 2023-06-29 13:31:58 -06:00
Saad Rahim
bdb527980a Fixing typo on GPU support tables for Radeon 2023-06-29 13:14:14 -06:00
Sam Wu
8e39a2a147 update release notes 2023-06-29 12:18:50 -06:00
Sam Wu
72c128f681 update project names for intersphinx 2023-06-29 11:30:50 -06:00
Sam Wu
284d024045 pdf configs 2023-06-29 11:24:57 -06:00
Sam Wu
da32369db1 config for pdf 2023-06-29 11:24:09 -06:00
Sam Wu
e70545bcd9 update release notes date 2023-06-29 11:21:30 -06:00
Sam Wu
3d88626dd4 update conf.py 2023-06-29 11:13:57 -06:00
Rahul Garg
0cfc1e480a Update backward incompatible planned changes in 5.5 (#2279)
* Update backward incompatible planned changes

* add planned changes to changelog

* update rocm-docs-core to v0.18.3

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-29 11:05:27 -06:00
Rahul Garg
c71d83207e Update backward incompatible planned changes in 5.5 (#2279)
* Update backward incompatible planned changes

* add planned changes to changelog

* update rocm-docs-core to v0.18.3

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-29 10:36:31 -06:00
Saad Rahim
f9aeee3e15 CLR manifest update and release note edit (#2299)
* removing deprecated libraries

* Release note fix

* manual updates

* Updating manifest for clr changes
2023-06-28 19:02:49 -06:00
Saad Rahim
6e50a85a93 removing deprecated libraries (#2298) 2023-06-28 17:33:07 -06:00
Saad Rahim
e8fdc582d8 Updating manifest for 5.6.0 release (#2297) 2023-06-28 17:13:00 -06:00
Saad Rahim
4df2273587 Table fix (#2296)
* Table fix

* Supported and unsupported tab fix
2023-06-28 16:47:18 -06:00
Sam Wu
cd1ec676f0 fix or remove broken links (#2281) 2023-06-28 16:34:38 -06:00
Saad Rahim
996f4a8c37 Compatibility Section for ROCm 5.6 (#2294)
* Update 3rd party compat for 5.6

* Update supported OS for 5.6

* Validated kernels

* linting

* missed GPU

* Update .wordlist.txt

---------

Co-authored-by: Máté Ferenc Nagy-Egri <mate@streamhpc.com>
2023-06-28 16:34:08 -06:00
Sam Wu
5bbe13fb75 Cherry pick changes from develop to 5.6 (#2295)
* Update Links (#2240)

* update link to PCIe Gen 4 pdf

* fix broken links

* remove references to broken links

* fix spelling of data center

* Fixing HIP link (#2236)

* Swati develop (#2245)

* Added deleted sections to openmp.md and other improvements

* Update openmp.md

Tagged `ICV`

* Solving indiscrepencies in openmp.md

There are apparently differences in the published document and information conveyed by the Dev. Fixed it.

* add new words to wordlist

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* fix rocm_smi_lib link in toc (#2260)

* ROCm FHS Reorganization, Backward Compatibility, and Versioning - rev (#2255)

* update requirements

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Ehud Sharlin <112672820+Ehud-Sharlin@users.noreply.github.com>
2023-06-28 16:30:19 -06:00
Saad Rahim
b899a3697c Further release notes (#2285)
* gfx906 GPU Maintenance Mode

* update changelog and release notes

* Final release notes

* Fix link

* update changelog and release notes

* rocgdb 13

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-28 16:23:38 -06:00
dependabot[bot]
d2884f482a Bump rocm-docs-core from 0.18.1 to 0.18.2 in /docs/sphinx (#2293)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.1 to 0.18.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.1...v0.18.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 16:16:33 -06:00
Sam Wu
f655458f87 Update release notes and changelog (#2274)
* update release notes for rocprofiler

* add release notes for rocgdb
2023-06-28 15:44:11 -06:00
Saad Rahim
8781a7706d Mi50 Maintenance Mode (#2277)
* gfx906 GPU Maintenance Mode

* update changelog and release notes

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-28 15:11:32 -06:00
dependabot[bot]
3643e8a6c2 Bump rocm-docs-core from 0.18.0 to 0.18.1 in /docs/sphinx (#2280)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 19:23:38 -06:00
dependabot[bot]
dce4d58348 Bump rocm-docs-core from 0.18.0 to 0.18.1 in /docs/sphinx (#2280)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.18.0 to 0.18.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.18.0...v0.18.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 17:33:02 -06:00
dependabot[bot]
02d86aa41b Bump rocm-docs-core from 0.17.2 to 0.18.0 in /docs/sphinx (#2278)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.2 to 0.18.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.2...v0.18.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 17:13:08 -06:00
dependabot[bot]
9eb46f8230 Bump rocm-docs-core from 0.17.2 to 0.18.0 in /docs/sphinx (#2278)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.2 to 0.18.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.2...v0.18.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 16:32:12 -06:00
dependabot[bot]
5615c90889 Bump rocm-docs-core from 0.17.1 to 0.17.2 in /docs/sphinx (#2276)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.1 to 0.17.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.1...v0.17.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 10:06:37 -06:00
srawat
73986668bb MI200 performance counters and OpenMP fixes 2023-06-27 08:17:35 -06:00
dependabot[bot]
6c179479f1 Bump rocm-docs-core from 0.17.1 to 0.17.2 in /docs/sphinx (#2276)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.1 to 0.17.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.1...v0.17.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 19:54:06 -06:00
dependabot[bot]
5b726ec96c Bump rocm-docs-core from 0.17.0 to 0.17.1 in /docs/sphinx (#2275)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.17.0 to 0.17.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.17.0...v0.17.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 16:37:42 -06:00
Sam Wu
21e433e91f Update changelog and release notes with hipStreamGetDevice (#2259)
* docs: update changelog and release notes with hipStreamGetDevice

* docs: fix typos and add version update notes

* docs: add HIP changelog

* remove What's New section from changelog
2023-06-26 16:03:04 -06:00
dependabot[bot]
e72f0dedde Bump rocm-docs-core from 0.16.0 to 0.17.0 in /docs/sphinx (#2273)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.16.0 to 0.17.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-26 15:35:54 -06:00
Mészáros Gergely
8bf7cfdddc Add documentation on 5.6 support SLES 15.5 (#2271)
* docs: clean up SLES tab-sets

- Always use a tab-set for SLES 15.4
- In the toplevel SLES title don't say version 15
- harmonize the `:sync:` labels between documents

* docs: Misc fixes in installation

- Fix rocm repository url in the installer script installation for SLES
- Add a missing :sync: tab in installation prerequisites

* docs: add SLES 15.5 support to installation and OS support pages
2023-06-26 15:29:55 -06:00
Ehud Sharlin
57e2253828 ROCm FHS Reorganization, Backward Compatibility, and Versioning - rev (#2255) 2023-06-26 14:07:02 -06:00
Saad Rahim
e05ce21fb4 MIOpen kdb installation instructions for PyTorch warmup performance improvement (#2248) 2023-06-22 09:47:38 -06:00
dependabot[bot]
233d3632b8 Bump rocm-docs-core from 0.15.0 to 0.16.0 in /docs/sphinx (#2262)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.15.0 to 0.16.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 21:37:05 -06:00
Sam Wu
bbfb18b5de fix rocm_smi_lib link in toc (#2260) 2023-06-21 20:22:48 -06:00
Sam Wu
6b1fdeab82 rocm_smi_lib 2023-06-21 17:18:17 -06:00
dependabot[bot]
66dd6c9467 Bump requests from 2.28.1 to 2.31.0 in /docs/sphinx (#2217)
Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.31.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.31.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 12:38:35 -06:00
dependabot[bot]
503809b74a Bump rocm-docs-core from 0.14.0 to 0.15.0 in /docs/sphinx (#2257)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.14.0 to 0.15.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-21 11:40:37 -06:00
srawat
9bc32154d8 Swati develop (#2245)
* Added deleted sections to openmp.md and other improvements

* Update openmp.md

Tagged `ICV`

* Solving indiscrepencies in openmp.md

There are apparently differences in the published document and information conveyed by the Dev. Fixed it.

* add new words to wordlist

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-20 10:52:55 -06:00
Nara
c1a8c5b030 docs(deploy/linux): update install instructions to 5.6 (#2244) 2023-06-16 07:27:00 -06:00
dependabot[bot]
0da29b73cb Bump rocm-docs-core from 0.13.4 to 0.14.0 in /docs/sphinx (#2249)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.4 to 0.14.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.4...v0.14.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-16 07:17:53 -06:00
dependabot[bot]
69580ef397 Bump cryptography from 40.0.2 to 41.0.0 in /docs/sphinx (#2218)
Bumps [cryptography](https://github.com/pyca/cryptography) from 40.0.2 to 41.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/40.0.2...41.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-14 16:46:26 -06:00
Saad Rahim
7762a8d874 Fixing HIP link (#2236) 2023-06-14 16:45:08 -06:00
Mészáros Gergely
014c904c4c Add RHEL 8.8 and 9.2 as supported distributions for 5.6 (#2242)
- add them to the os support table
- add install instructions for them
2023-06-14 07:07:50 -06:00
Nara
e8275e7fd3 ROCm 5.6 Changelog Updates (#2238)
* fix(manifest): fix missing remote entries in default.xml

* fix(autotag): fix issues when fetching non-standardized changelogs

* docs(changelog): updated changelog for ROCm 5.6
2023-06-14 07:06:49 -06:00
Sam Wu
2ec3e537a4 Update Links (#2240)
* update link to PCIe Gen 4 pdf

* fix broken links

* remove references to broken links

* fix spelling of data center
2023-06-14 07:05:06 -06:00
Alfin Auzikri
51af0be780 Update tensorflow_install.md (#2237)
* Update tensorflow_install.md

Fixed writing commands so that when executed by copy paste it doesn't cause an error.

* Update tensorflow_install.md

Following @saadrahim's suggestion of using "\" to signify a line break in bash.
2023-06-12 09:29:44 -06:00
Nagy-Egri Máté Ferenc
5e24832f3b Remove package pin from quick start quide (#2233)
* Remove package pin from quick start quide

When installing a single-package fashion, no version pinning is needed

* Add package pinning to quick start guide

Pinning the packages is required to make apt prefer the rocm packages
instead of the system ones when both provide the same package (e.g
`rocm-smi`).

* Removing Ubuntu 20.04 change

---------

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-06-09 13:56:23 -06:00
srawat
6757f9dc56 Added specialized kernels to openmp.md (#2187)
* Added specialized kernels to openmp.md

A few formatting changes and addition of specialized kernels section at the end.

* Added Specialized kernels in openmp.md

Some formatting changes and addition of specialized kernels instead of no loop and cross team kernels

* Added specialized kernel to openmp.md

* Added specialized kernels to openmp.md

* Replaced the usage of uncertain clauses(may/might) in  openmp.md

* Attempt to align the table headings for environment variables in openmp.md

* Feedback from Dhruva

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-06-08 10:00:51 -06:00
Gergely Meszaros
a471e8debe Add instructions for adding extra repositories in RHEL and SLES
The hip-devel package depends on perl modules not distributed by default
on RHEL and SLES distriubutions, these can be installed from EPEL and
the `devel:languages:perl` repository respectively.

Ideally in the future these dependencies would be replaced with packages
available from default repositories, but in the meanwhile this should
be at least documented.
2023-06-08 09:37:00 -06:00
dependabot[bot]
8c86526f98 Bump rocm-docs-core from 0.13.3 to 0.13.4 in /docs/sphinx (#2226)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.3 to 0.13.4.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.3...v0.13.4)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-08 09:18:23 -06:00
Mészáros Gergely
a42fae5140 Install fixes (#2228)
* Remove install instructions for unsuported RHEL 8.8 and 9.2

Current ROCm release does not support these versions of RHEL

* Centralize disclaimers and perquisites for installation

- Move the single-version to multi-version diclaimer to the install
  overview page where single vs multi installs are discussed.
- Move the installation of kernel-headers and development packages
  to the install preparation page. Unify it mainly from the quick start
  content.

* s/Name/name/ in repository config files for RHEL

The repository name can be set as `name=><name>` instead of `Name`,
otherwise yum complains about the repo not having a name, e.g:
```output
Repository 'ROCm-5.3.3' is missing name in configuration, using id.
```

This is fixed with this commit.

* Clean up render/video group section on prerequisites

* Installation and Upgrade restructuring & fixes

- Fix the rocm package urls for RHEL in the install & upgrade guides
  - RHEL8 and 9 have different URLs, add a tab-set similar to ubuntu
    for them.
- Fix the package URL in the upgrade guide for SLES (previously pointed
  to the amdgpu url)
- Change the apt-signing key download and conversion to the method used
  in the quick start guide, which is the recommended by ubuntu maintainers
- Change the install steps from list items to rubrics with numbered entries
  which is more readable and matches the style in the quick start guide
- Do not pass `--append` to `tee` in the upgrade guide, because it is
  meant to overwrite.
- Split the one long tab-set to multiple tab-sets in the upgrade guide
  to improve readability
2023-06-08 09:17:51 -06:00
Saad Rahim
bcb3dd3b4a PCIe Atomics (#2223)
Co-authored-by: Nagy-Egri Máté Ferenc <beiktatas+github@outlook.hu>
2023-06-06 21:52:18 -06:00
Mészáros Gergely
8784fe3fba Install updates (#2221)
* Install updates

- revert distro command installation -> package manager installation
- move description of installer script to common section
- updates to the installer script installation page
- other misc fixes

* Fix spelling
2023-06-06 07:06:06 -06:00
Saad Rahim
6e79d204b8 Further installation fixes (#2219)
Co-authored-by: Sam Wu <sjwu@ualberta.ca>
2023-06-04 11:33:27 -06:00
Sam Wu
7076bc18ca Standardize install instructions (#2220)
* standardize install instructions

* use rocm-5.5.1 in install instructions
2023-06-04 10:49:11 -06:00
Saad Rahim
519df7a51f Refactoring installation documentation (#2202)
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-06-02 14:35:24 -06:00
dependabot[bot]
90c697b6d3 Bump rocm-docs-core from 0.13.2 to 0.13.3 in /docs/sphinx (#2214)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.2 to 0.13.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.2...v0.13.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-01 11:52:50 -06:00
Nara
125cc37981 Update changelog for 5.5.1 (#2199)
* docs(changelog): update changelog for 5.5.1

Signed-off-by: Nara Prasetya <nara@streamhpc.com>

* docs(changelog): Improve continuity in release notes

* docs(changelog): Add changelog to TOC

---------

Signed-off-by: Nara Prasetya <nara@streamhpc.com>
2023-06-01 09:40:51 -06:00
Nagy-Egri Máté Ferenc
5752b5986c Remove links to docs.amd.com (#2200)
* Remove links to docs.amd.com

* Fix linking to list item (not possible)
2023-06-01 08:16:38 -06:00
dependabot[bot]
2829c088c2 Bump rocm-docs-core from 0.13.1 to 0.13.2 in /docs/sphinx (#2201)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.1 to 0.13.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.1...v0.13.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-31 11:49:43 -06:00
dependabot[bot]
3b9fb62600 Bump rocm-docs-core from 0.13.0 to 0.13.1 in /docs/sphinx (#2190)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.13.0 to 0.13.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.13.0...v0.13.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-05-30 10:20:18 -06:00
Mészáros Gergely
b7222caed2 Replace incorrect em-dashes with dashes in code-blocks (#2192)
Replace em-dash('–') with dash('-') in code blocks where the latter was
meant.
2023-05-30 07:26:23 -06:00
Nagy-Egri Máté Ferenc
c285dd729f Team-feedback (#2193)
* Fix hipRAND copy-paste error

* Remove superflous table reference
2023-05-30 07:06:06 -06:00
Mészáros Gergely
0c93636d23 Replace links to subprojects docs with intersphinx links (#2181) 2023-05-29 12:33:46 -06:00
Sam Wu
3fa5f1fddc Update doc requirements and suppress duplicate main doc link (#2189)
* update to rocm-docs-core v0.13.0

also suppress main doc link

* rename home link to ROCm Documentation Home
2023-05-29 12:32:50 -06:00
Saad Rahim
17b029b885 Changing title (#2183) 2023-05-25 22:32:59 -06:00
Saad Rahim
460f46c3be Adding repo priority for Ubuntu 22.04 (#2178)
* Adding repo priority for Ubuntu 22.04

* removed unnecessary apt-update
2023-05-25 14:46:43 -06:00
Mészáros Gergely
6feca81dd0 docs: fix bios settings tables in mi100/mi200 tuning guides (#2179)
Add empty cells to list tables to make them uniform (all rows have the
same number of cells), before this the tables errored out with:

> ERROR: Error parsing content block for the "list-table" directive:
> uniform two-level bullet list expected, but row 13 does not contain
> the same number of items as row 1 (3 vs 4)

and the table did not show up.
2023-05-25 09:54:40 -06:00
Mészáros Gergely
ec8496041a ci: change markdown linting to use the NodeJs markdownlint (#2180)
* ci: change markdown linting to use the NodeJs markdownlint

The original ruby based markdownlint has a few shortcomings not known
when it was introduced:
- no support for myst extensions
- no support for disabling specific rules for specific files or regions

These two combined make it very hard to use when used for this project
when it has false positives around myst extensions.

Luckily there's a NodeJS based version of markdownlint [1] supporting the
same ruleset that is more configurable:
- seems to support myst extensions better
- has an html comment based syntax to disable specific rules

The library seem to be better maintained too and with better tooling:
e.g. there's a vscode extension using the engine for local use:
markdownlint (DavidAnson.vscode-markdownlint).

[1]: https://github.com/DavidAnson/markdownlint

* docs: hotfix empty links

There are missing links in the docs, these should get fixed, but for now
they are just monkey patched to make CI happy.

* docs: fix links

---------

Co-authored-by: Nara Prasetya <nara@streamhpc.com>
2023-05-25 09:51:19 -06:00
Edgar Gabriel
c7350c08ab update the gpu-aware-mpi page (#2176)
* update the gpu-aware-mpi page

Three changes:
 - add the ucx compatibility table
 - add the --with-rocm=/opt/rocm option to the compilation of Open MPI
 - add a section about how to compile and use UCC for collective
operations.

* Changing link to relative

* Update gpu_aware_mpi.md

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-05-24 16:42:45 -06:00
Sam Wu
c1809766e6 Link fixes (#2177)
* fix rocmcc link

* remove unused link

* remove unused linkcheck configs

* update amd smi section

add link to ami smi github

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-05-24 16:14:23 -06:00
Saad Rahim
61df1ec8c6 Updating link to new dev hub (#2174) 2023-05-24 16:11:14 -06:00
Li Li
983987aab5 Update deep learning guide (#2124)
* add deep learning guide

* seperate out oprimization, reference, and troubleshooting as standalone sections.

* resolve lint errors

* delete introduction to DL

* correct syntax highlights and filename

* remove out-of-date QAs

* Renaming and cleanup

* Spelling

* Fixup TOC

---------

Co-authored-by: Nara Prasetya <nara@streamhpc.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-05-24 16:04:30 -06:00
zhang2amd
914b62e219 Update default.xml for 5.5.1 release 2023-05-24 13:17:55 -07:00
Saad Rahim
faac45772c Broken Links (#2172) 2023-05-24 11:11:40 -06:00
dependabot[bot]
d206494272 Bump rocm-docs-core from 0.11.1 to 0.12.0 in /docs/sphinx (#2171)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.11.1 to 0.12.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.11.1...v0.12.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 10:19:34 -06:00
Saad Rahim
26c73a3986 Fixing GPU support tables (#2170)
* Fixing GPU support tables

* Linting
2023-05-24 10:06:12 -06:00
Nagy-Egri Máté Ferenc
dc74008ac6 Fix-landing-pages (#2167) 2023-05-24 07:27:50 -06:00
dependabot[bot]
108287dcd7 Bump rocm-docs-core from 0.11.0 to 0.11.1 in /docs/sphinx (#2164)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.11.0 to 0.11.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.11.0...v0.11.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 07:08:10 -06:00
Nagy-Egri Máté Ferenc
38440915ef Finish-compat-section (#2166)
* User/Kernel-Space compat

* Update ML compat at 5.5.0

* Fix spelling of user and kernel space
2023-05-24 07:02:43 -06:00
srawat
d9c434881a Update openmp.md (#2163)
Updated the link for supported GPUs from absolute to relative "(../../release/gpu_os_support.md#gpu-support-table)"
2023-05-23 07:05:18 -06:00
Nagy-Egri Máté Ferenc
4c795d45f6 Typo and link style fixes (#2158)
* CMake package config filename format

* No links as text
2023-05-22 17:27:59 -06:00
Saad Rahim
ef0a88ea0e Navigation improvement (#2151)
* Reorganized Ref Grid card and ROCm intro

* MIGraphX link

* openmp header cleanup

* Fixing durationN

* Syncing grid cards to left nav
2023-05-19 15:07:46 -06:00
Nagy-Egri Máté Ferenc
34578f0193 Compatibility pages review (#2134) 2023-05-19 07:38:14 -06:00
Saad Rahim
6d32125543 Merge pull request #2150 from saadrahim/further_fixes
Additional fixes
2023-05-18 16:22:03 -06:00
Saad Rahim
f4a481e58b URL change and nav cleanup 2023-05-18 14:42:03 -06:00
zhozha
081a2948ff Update manifest for v5.5 release 2023-05-18 11:49:10 -06:00
Nagy-Egri Máté Ferenc
6c1fff6692 RDNA2 Virtualization Guide (#2149) 2023-05-18 09:39:37 -06:00
dependabot[bot]
0b249ff088 Bump rocm-docs-core from 0.10.3 to 0.11.0 in /docs/sphinx (#2148)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.3 to 0.11.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.3...v0.11.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2023-05-17 22:34:23 -06:00
Saad Rahim
49d4d1b6bc Navigation cleanup (#2147) 2023-05-17 22:32:07 -06:00
Sam Wu
f953a99298 Update links to new docs and rename .sphinx dir to sphinx (#2141)
* update links to new docs and rename .sphinx dir to sphinx

* fix spelling and formatting
add new words to wordlist
remove empty headers
remove version number for ROCm in conf.py

fix typos

* add more formats to rtd config
2023-05-17 11:40:18 -06:00
Nagy-Egri Máté Ferenc
4096b867d8 CMake HIP language support (#2104) 2023-05-17 07:07:22 -06:00
Nara
494ba37d87 docs: clean up (#2143) 2023-05-16 07:27:27 -06:00
dependabot[bot]
df32eed823 Bump rocm-docs-core from 0.10.2 to 0.10.3 in /docs/.sphinx (#2140)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.2 to 0.10.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.2...v0.10.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 07:24:05 -06:00
dependabot[bot]
b173f6b226 Bump rocm-docs-core from 0.10.1 to 0.10.2 in /docs/.sphinx (#2139)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.1 to 0.10.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.1...v0.10.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-14 21:17:04 -06:00
dependabot[bot]
09423f1e4e Bump rocm-docs-core from 0.10.0 to 0.10.1 in /docs/.sphinx (#2129)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.0 to 0.10.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.0...v0.10.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-11 10:47:27 -06:00
Nagy-Egri Máté Ferenc
d9f272a505 MI100 and MI200 extra content (#2112) 2023-05-11 09:34:11 -06:00
Saad Rahim
ba14589a9a Grammar and other typos (#2123) 2023-05-10 13:25:40 -06:00
dependabot[bot]
f8fe609302 Bump rocm-docs-core from 0.9.2 to 0.10.0 in /docs/.sphinx (#2125)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.9.2 to 0.10.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.9.2...v0.10.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-09 18:58:04 -06:00
dependabot[bot]
fd9ae73706 Bump rocm-docs-core from 0.9.1 to 0.9.2 in /docs/.sphinx (#2118)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.9.1 to 0.9.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.9.1...v0.9.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-05 13:40:17 -06:00
Michael E. Rowan
58481f3b83 update file_reorg.md (#2117) 2023-05-05 13:28:05 -06:00
Sam Wu
012e4c542b Set article info for pages (#2090) 2023-05-05 07:32:44 -06:00
Mészáros Gergely
55b5b66901 Add GPU isolation (#2114)
* Add GPU isolation guide

* Add hover text expansion of DKMS in linux quick start guide
2023-05-04 11:44:09 -06:00
Nagy-Egri Máté Ferenc
62ed404058 Initial GPU-aware MPI port (#2086)
* Initial GPU-aware MPI port

* Remove trailing spaces

* Allowlist word in gpu_aware_mpi
2023-05-04 09:42:22 -06:00
Saad Rahim
66ed6adf6c Adding release notes (#2113) 2023-05-04 08:40:56 -06:00
Nara
e04c646088 Update openmp documentation (#2103)
* docs(openmp): updated openmp documentation

* style(openmp): 80 column widths
2023-05-03 09:55:54 -06:00
dependabot[bot]
fcc6283748 Bump rocm-docs-core from 0.7.1 to 0.8.0 in /docs/.sphinx (#2102)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.7.1 to 0.8.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.7.1...v0.8.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-02 17:12:45 -06:00
Saad Rahim
28a4b8d477 What is ROCm? - Overview (#2096) 2023-05-01 22:02:16 -06:00
Nara
2aec75e201 Release notes for ROCm 5.5.0 (#2094)
* docs(release_notes): added release notes for ROCm 5.5.0

* ci(linting): Add RELEASE.md to ignore RegEx.
2023-05-01 21:53:54 -06:00
doscherda
2072f82761 Update docker.md (#2067)
* Update docker.md

add --security-opt seccomp=unconfined info

* ci fixups

---------

Co-authored-by: Nara Prasetya <nara@streamhpc.com>
2023-05-01 08:25:47 -06:00
Sam Wu
5c4ab7d675 update supported python versions for documentation (#2092)
rocm-docs-core dependencies requires python>=3.8 and python<3.9
2023-04-28 08:44:59 -06:00
Saad Rahim
d5eb2b25f2 Changing version number (#2091) 2023-04-27 11:27:43 -06:00
dependabot[bot]
bcc1432d83 Bump rocm-docs-core from 0.6.0 to 0.7.1 in /docs/.sphinx (#2088)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.6.0 to 0.7.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.6.0...v0.7.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-26 11:12:16 -06:00
Saad Rahim
776605266c Removing Windows Documentation (#2085) 2023-04-25 21:34:30 -06:00
Sam Wu
4c62bb74ff remove linkcheck step from rtd (#2081) 2023-04-24 15:54:23 -06:00
Sam Wu
57c601262b HPC cleanup - Clean up the deployment related pages (#2080)
* Clean up the deployment related pages

- Add an index page for the linux deployment submenu
- Remove deployment options that are not yet completed (i.e. spack,
from source installation)
- remove the general deployment index page
- various cleanups and clarifications in the rest of the pages

* Move all deploy pages to deploy folder

---------

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
2023-04-24 12:07:17 -06:00
Sam Wu
b897bddf38 Linkcheck and prepare alpha (#2078) 2023-04-24 11:25:31 -06:00
Nara
48db1eea8d Spell checking (#2070)
* ci: cleanup linters and add spelling checker

* docs: fix spelling and styling issues
2023-04-24 07:09:09 -06:00
Saad Rahim
08821f1098 fixing links for HIP (#2068) 2023-04-20 10:21:40 -06:00
dependabot[bot]
3a93ce8fc9 Bump rocm-docs-core from 0.5.0 to 0.6.0 in /docs/.sphinx (#2062)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-18 07:38:46 -06:00
dependabot[bot]
a167088d41 Bump rocm-docs-core from 0.4.0 to 0.5.0 in /docs/.sphinx (#2050)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-16 17:53:55 -06:00
Saad Rahim
85dd6e4234 Fixing GPU list (#2049) 2023-04-14 11:59:39 -06:00
Nara
507530aeb5 Ignore markdown linting in autotag template folder (#2047) 2023-04-14 08:14:00 -06:00
Nara
2de2059feb Fix some linting issues (#2046) 2023-04-14 15:17:21 +02:00
Nara
b81a27c2a2 Modify AutoTag to generate changelog (#2004) 2023-04-14 07:11:08 -06:00
Saad Rahim
19c0ba1150 Readme Cleanup (#2037) 2023-04-13 20:14:51 -06:00
dependabot[bot]
043427989f Bump rocm-docs-core from 0.2.0 to 0.4.0 in /docs/.sphinx (#2042)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.2.0 to 0.4.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/commits/v0.4.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-13 16:47:47 -06:00
Nagy-Egri Máté Ferenc
21033eb98b 1908 install upgrade uninstall guide (#2039) 2023-04-13 11:24:19 -06:00
Sam Wu
c3298b5944 add python versions known to build docs (#2040) 2023-04-13 10:25:38 -06:00
Ehud Sharlin
7bbd5bc79d Deep Learning Training - Troubleshooting & References (#2033) 2023-04-12 07:37:52 -06:00
Saad Rahim
b1a971b432 Updating version string to include alpha(#2035) 2023-04-11 09:52:50 -06:00
Saad Rahim
41dc33d95d Fixing openmp link (#2029) 2023-04-06 13:49:24 -06:00
Saad Rahim
97339ffe33 lab notes added to navigation (#2026) 2023-04-06 13:45:45 -06:00
Brian Cornille
47688609af Updated outdated OpenMP information on flags and example locations and fixed some typos. (#2027)
Co-authored-by: Brian Cornille <Brian.Cornille@amd.com>
2023-04-06 13:45:21 -06:00
Justin Chang
1533f5edb6 Added reference to AMD lab notes (#2025) 2023-04-06 10:46:43 -06:00
Nagy-Egri Máté Ferenc
1ec7e1c933 Port installation guide (#2018) 2023-04-06 09:42:07 -06:00
Mészáros Gergely
64a243fc29 build(deps): Pin rocm-docs-core based on the pypi version (#2024)
Dependabot should keep this up-to-date, so we can now actually pin
a version to avoid breaking when it is updated.
2023-04-06 09:37:17 -06:00
Mészáros Gergely
fa298efcbb ci(deps): Fix dependabot config (#2023)
The manifests are in the docs/.sphinx directory.
2023-04-06 09:11:42 -06:00
Nagy-Egri Máté Ferenc
08d8d2612a Comment triage (#2022)
- Unify code block style (indent vs. fence)
- Mark code languages
- Increase heading level one at a time
- No extra newlines between paragraphs
- List for header reorg stages
- Shrink ascii table (mobile friendlyness)
- 80-column width
2023-04-06 09:11:09 -06:00
Sam Wu
fc3f2ccb38 Add dependabot configuration (#2016)
* add dependabot config

* change bot pr target branch

* set bot interval to daily
2023-04-05 12:31:12 -06:00
Lauren Wrubleski
9683d6f776 Include autotag script as generic ROCm tool (#1949) 2023-04-03 07:09:01 -06:00
Sam Wu
9833748ff0 Doc update (#2011)
* add url to ROCgdb-docs

update reqs and gitignore

* add validation tools section for RVS and TransferBench

* stub in links for validation/mgmt tools

* populate compilers page

* add cards for ai libs and computer vision pages

* add content to math lib pages

* reorg hip and math libs

* update index

* consolidate linear algebra libs

* fix release info order in toc

* fix links and content cards for libraries

* update mdl ignored files

* update understand rocm section

* fix formatting errors

* add link to openmp

* ignore md041
2023-03-31 18:04:21 -06:00
searlmc1
e83512605d Update openmp.md (#2010)
Fix typo
2023-03-31 14:31:12 -06:00
Saad Rahim
e7ed560520 Cleanup Navigation for C++ Primitives (#2009) 2023-03-31 08:45:11 -06:00
Saad Rahim
110e2444e9 Navigation Links Updated (#2008) 2023-03-31 08:13:36 -06:00
nunnikri
71c16c4b96 Adding ROCm File Reorganization White Paper (#1951)
* Adding ROCm File Reorganization White Paper

* Applying formatting

* Reorganizing file structure

* Update file_reorg.md

Correcting spelling mistakes

* Update file_reorg.md

* Update file_reorg.md

---------

Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Saad Rahim <saad.rahim@amd.com>
2023-03-31 07:54:50 -06:00
Nagy-Egri Máté Ferenc
2e7266c829 1908-uninstall-guide-linux (#2000) 2023-03-31 07:33:22 -06:00
Sam Wu
80778f173f Update API Refs (#2006)
* add url to ROCgdb-docs

update reqs and gitignore

* add validation tools section for RVS and TransferBench

* stub in links for validation/mgmt tools

* populate compilers page

* add cards for ai libs and computer vision pages

* add content to math lib pages

* reorg hip and math libs

* update index

* consolidate linear algebra libs
2023-03-30 15:14:43 -06:00
Ehud Sharlin
415f3b93ad Inception V3 Example, Deep Learning Guide Decomposed and OpenMP Guide (#1937) 2023-03-30 08:01:06 -06:00
Saad Rahim
63b3b55ed5 Enabling markdown lint on PRs (#2005)
* Enabling markdown lint on PRs

* Fix syntax
2023-03-29 11:05:20 -06:00
Nagy-Egri Máté Ferenc
286f120d9a MI100 architecture guide (#1994)
* Initial MI100 docs

* Try changing style to fix MD004

* Disable MD004

* Disable MD005

* Move to {table} from {list-table}

* Don't disable few MD styles
2023-03-29 07:14:23 -06:00
Aswin John Mathews
519707db4f Added support matrices (#1991)
* Added support matrices

* bullets
2023-03-28 08:45:13 -06:00
Saad Rahim
b213d94dd6 Updates to navigation organization for AI Libraries (#1993) 2023-03-25 11:33:02 -06:00
Saad Rahim
875e07b801 Figure update to figure-md (#1980) 2023-03-24 11:05:52 -06:00
Nagy-Egri Máté Ferenc
ac42cbc97b Initial ReST linting (#1979) 2023-03-24 08:34:27 -06:00
Nara
20f8185e0d ROCmCC & Win Install: Table & Figures Cleanup (#1984)
* Use MyST style table headers

* Fixup win install page

* Use option directives for args

* Revert list-tables
2023-03-24 08:32:22 -06:00
Saad Rahim
934cc718b1 Pulling libraries out in the navigation tree (#1989)
* Pulling libraries out

* add libraries listed in left sidebar to index page

* Adding all

* Updating nav tree

* fix link to rocm-examples in toc

* update TOC

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-03-24 08:08:30 -06:00
Saad Rahim
5534e47b16 License updates (#1985) 2023-03-23 09:28:56 -06:00
Saad Rahim
ca10bba2c3 Updating the contribution guide (#1982) 2023-03-22 15:53:01 -06:00
Nagy-Egri Máté Ferenc
8702d500ad Initial Markdown linting (#1978) 2023-03-22 15:45:50 +01:00
Nagy-Egri Máté Ferenc
e9ee6b9874 Initial MI250 Guide (#1976)
* Initial MI250 Guide

* Limit line length to 80 columns

* References using MyST

* Move to figure-md and numref

* Add MI250 to TOC
2023-03-22 15:45:00 +01:00
Roopa Malavally
2f51e147f2 Update licensing.md (#1981) 2023-03-21 14:36:52 -06:00
Nagy-Egri Máté Ferenc
01422a3cc4 Initial contributing guide (#1961) 2023-03-21 11:38:03 -06:00
Saad Rahim
903aae3321 Adding stub for management tools (#1971)
* Adding stub for management tools

* spelling
2023-03-20 07:15:00 -06:00
Nara
d76b9b2fbf Update about section with MyST information (#1975)
* added myst section

* US spelling
2023-03-20 07:08:32 -06:00
Saad Rahim
7f4b69c3a0 Demonstrating figure and table caption standard (#1974)
* Adding figure formatting

* Adding tables
2023-03-19 13:51:11 -06:00
Saad Rahim
e65c857ad2 Adding rocALUTION (#1970) 2023-03-17 20:47:14 -06:00
Saad Rahim
b951a2bef8 Updating HIP landing page (#1969) 2023-03-17 15:16:41 -06:00
Nagy-Egri Máté Ferenc
1a570efb48 Math Libraries Landing pages (#1940)
* Add C++ algorithm primitive lib cards

* Add PRNG section

* API Reference Manuals first

* Add Tensile and rocWMMA

* Change rocFFT and hipFFT order for consistency

* Add RCCL

* Fix PRNG links

* Add rocSOLVER and hipSOLVER

* Add general note on rocLIB vs hipLIB
2023-03-17 10:37:03 -06:00
Mészáros Gergely
75f4c018cc Quick Start Linux: Add RHEL and SLES instructions, minor touch-ups to Ubuntu (#1968)
* linux quick start: Mention correct package to install

* linux quick start: Rephrase prerequisites

Mention that installing the headers is usually not required by hand.

* linux quick start: Simplify command to get singing key

* linux quick start: Add instructions for RHEL and SLES
2023-03-17 07:13:41 -06:00
Mészáros Gergely
f1a46ae86b Update quick start guide for Ubuntu (#1964)
Reorganize the quick start guide for linux, adding multi level
tab selection for just the commands where it makes sense.

Currently mostly Ubuntu commands are filled out, if the structure
looks fine, then more will follow.
2023-03-16 12:05:47 -06:00
Alex Voicu
8bc40f4649 Fix a typo (#1962) 2023-03-16 11:08:12 -06:00
Sam Wu
d614c6e500 hide link to main ROCm docs (#1960)
ROCm already links to main ROCm docs in default sidebar header unlike other subprojects
2023-03-16 10:05:21 -06:00
Saad Rahim
3b4c592c53 Changing navbar home name (#1950)
* Changing navbar home name

* Reorg navigation tree
2023-03-16 07:47:22 -06:00
Alex Voicu
bcba7ed752 Rtd alexv feedback (#1945) 2023-03-15 12:22:25 -06:00
Mészáros Gergely
9144ac6238 Add docker deployment guide (#1938)
* Add docker deployment guide.

* Correct 'Docker Hub' styling.
2023-03-14 09:56:10 -06:00
Nara
b65adbd159 update landing page (#1939) 2023-03-14 09:19:47 -06:00
Ehud Sharlin
4ce8372761 Updates to complier doc (#1921) 2023-03-13 14:07:58 -06:00
Saad Rahim
5c80077b67 Fixing rccl link (#1935) 2023-03-13 13:08:16 -06:00
Sam Wu
5787b613f6 Add 404 page (#1933)
* Add 404 page

Only build htmlzip format for docs

* Add homepage link to 404 page
2023-03-13 12:20:47 -06:00
Mészáros Gergely
5ce34c593a Ignore more generated files in gitignore (#1934)
Add more of the sphinx generated files, so generating the docs does not
add untracked files. Ignore the folder `.venv` typically used for
virtual environments.
Also sort the ignored file list for easier maintenance.
2023-03-13 12:01:26 -06:00
Saad Rahim
3db2cff387 Fixing build (#1920) 2023-03-09 16:49:14 -07:00
Saad Rahim
555e4f078b Support levels (#1919) 2023-03-09 16:28:33 -07:00
Saad Rahim
b19681711c Pitchfork Standard for Docs (#1918) 2023-03-09 14:03:04 -07:00
Saad Rahim
67cd4c3789 Documentation Redesign (#1883) 2023-03-09 12:02:54 -07:00
zhang2amd
a2790438b5 Update manifest for v5.4.3 release 2023-02-07 13:52:06 -08:00
zhang2amd
e6646b2f38 Readme update for v5.4.3
Update README.md
2023-02-07 13:51:16 -08:00
Roopa Malavally
9126c010d4 Update README.md 2023-02-07 13:47:46 -08:00
zhang2amd
52876c050b Update manifest to v5.4.3 2023-02-07 13:33:54 -08:00
zhang2amd
81722b3451 Merge pull request #1890 from RadeonOpenCompute/Rmalavally-patch-10
Update README.md
2023-01-13 13:30:00 -08:00
Roopa Malavally
e464db856c Update README.md 2023-01-13 12:38:37 -08:00
zhang2amd
8b49837f76 Update manifest to release v5.4.1 2022-12-15 15:19:21 -08:00
zhang2amd
0e2b33f904 Merge pull request #1878 from Rmalavally/master
Update README.md
2022-12-15 15:17:49 -08:00
Roopa Malavally
4eb9653b68 Update README.md 2022-12-15 14:56:40 -08:00
zhang2amd
a1884e46fe Update manifest to v5.4 release. 2022-11-30 12:35:07 -08:00
zhang2amd
419f1a9560 Merge pull request #1870 from RadeonOpenCompute/Rmalavally-patch-8
Update README.md
2022-11-30 12:34:09 -08:00
Saad Rahim
a9c87c8b13 Adding stakeholders to CODEOWNERS file (#1823) 2022-11-30 13:22:26 -07:00
Roopa Malavally
002cca3756 Update README.md 2022-11-30 11:39:15 -08:00
zhang2amd
48ded5bc01 Update readme to fix typo, v5.3.3 release. 2022-11-17 14:24:32 -08:00
zhang2amd
ee989c21f9 Update manifest to v5.3.3 release 2022-11-17 14:15:27 -08:00
zhang2amd
b638a620ac Merge pull request #1858 from RadeonOpenCompute/Rmalavally-patch-5
Update README.md
2022-11-17 14:14:30 -08:00
Roopa Malavally
36a57f1389 Update README.md 2022-11-17 13:36:08 -08:00
Saad Rahim
c92f5af561 Adding MIT License file (#1845) 2022-11-15 12:54:14 -07:00
zhang2amd
09001c933b Update manifest file to v5.3.2 release. 2022-11-09 17:32:00 -08:00
zhang2amd
b7c9943ff7 Merge pull request #1855 from RadeonOpenCompute/Rmalavally-patch-5
Update README.md
2022-11-09 17:29:56 -08:00
Roopa Malavally
25a52ec827 Update README.md 2022-11-09 17:16:04 -08:00
zhang2amd
b14834e5a1 Merge pull request #1818 from RadeonOpenCompute/Rmalavally-patch-3
Update README.md
2022-10-04 10:19:57 -07:00
zhang2amd
368178d758 Update manifest to release 5.3.0 2022-09-30 16:20:26 -07:00
Roopa Malavally
a047d37bfe Update README.md 2022-09-30 16:09:00 -07:00
Saad Rahim
7536ef0196 Fixing Ubuntu 22 to Ubuntu 20 (#1792) 2022-08-19 14:21:53 -06:00
Saad Rahim
5241caf779 Final edits to documentation (#1791) 2022-08-18 17:26:47 -06:00
Saad Rahim
1ae99c5e4b Updates to release notes, changelog and manifest for ROCm 5.2.3 (#1788) 2022-08-18 14:37:04 -06:00
Saad Rahim
f034733da2 Adding a CODEOWNERS file (#1771) 2022-07-29 14:26:53 -06:00
Saad Rahim
d4879fdec4 Removing unused files (#1772) 2022-07-22 13:37:21 +01:00
Roopa Malavally
60957c84b7 Update README.md 2022-07-21 17:47:33 -07:00
zhozha
3859eef2a9 Update manifest to 5.2.1 release 2022-07-21 16:28:50 -07:00
zhozha
4915438362 Update manifest for ROCm 5.2 release, remove old docs 2022-06-28 18:32:04 -07:00
Roopa Malavally
c4ce059e12 Update README.md 2022-06-28 18:13:03 -07:00
Ronan Keryell
ca4d4597ba Add release note section (#1740)
* Remove spurious trailing spaces

* Move all the release notes into a global release note section
2022-05-25 22:06:01 -06:00
zhozha
418e8bfda6 Update manifest to 5.1.3 release 2022-05-20 15:58:17 -07:00
Roopa Malavally
82477df454 Update README.md 2022-05-20 15:37:38 -07:00
zhang2amd
075562b1f2 Update manifest to 5.1.1 release 2022-04-08 17:30:20 -07:00
Roopa Malavally
74d067032e Update README.md 2022-04-08 17:08:51 -07:00
zhang2amd
526846dc7e Update manifest to 5.1 release 2022-03-30 18:44:16 -07:00
Roopa Malavally
a47030ca10 Update README.md 2022-03-30 18:19:14 -07:00
zhang2amd
fac29ca466 Update default.xml for ROCm 5.0.2 release 2022-03-04 15:28:19 -08:00
Roopa Malavally
986ba19e80 Update README.md 2022-03-04 15:13:47 -08:00
Roopa Malavally
e00f7f6d59 Update README.md 2022-03-04 15:12:54 -08:00
Roopa Malavally
cac8ecf2bc Update README.md 2022-02-23 21:23:18 -08:00
Roopa Malavally
2653e081e2 Delete Hardware_and_Software_Support.md 2022-02-23 21:22:43 -08:00
Roopa Malavally
34eb2a85f3 Update Hardware_and_Software_Support.md 2022-02-23 18:30:44 -08:00
Roopa Malavally
164129954e Create Hardware_and_Software_Support.md 2022-02-23 18:29:53 -08:00
zhang2amd
eaf8e74802 Update default.xml for ROCm 5.0.1 release 2022-02-16 16:58:29 -08:00
Roopa Malavally
403c81a83e Update README.md 2022-02-16 16:54:40 -08:00
Cory Bloor
ced195c62c Cleanup README.md formatting (#1674)
* Cleanup README.md formatting

Fixed code formatting, broken URLs and changelog table.

* Update README.md

Fixup rocm-smi --showtopoaccess.
2022-02-13 08:24:02 -08:00
Roopa Malavally
3486206b09 Add files via upload 2022-02-10 16:12:04 -08:00
Roopa Malavally
c379917e1c Delete ROCm_Release_Notes_v5.0.pdf 2022-02-10 16:11:51 -08:00
Roopa Malavally
0a60a3b256 Add files via upload 2022-02-10 09:57:04 -08:00
Roopa Malavally
99a3476a5e Delete ROCm_Installation_Guide v5.0.pdf 2022-02-10 09:56:43 -08:00
Roopa Malavally
ad3a774274 Delete ROCm_Installation_Guide_v5.0.pdf 2022-02-10 08:43:38 -08:00
Roopa Malavally
5bb9c86fb6 Add files via upload 2022-02-10 08:43:17 -08:00
zhang2amd
0a0b750e0e Update default.xml for ROCm 5.0 release 2022-02-09 21:11:15 -08:00
Roopa Malavally
c6ec9d7b55 Update README.md 2022-02-09 21:09:07 -08:00
Roopa Malavally
a1eac48dea AMD ROCm Release v5.0 (#1670)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* converting md tables

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* more changes to table

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* latest changes with alignment

* Update README.md

* Update README.md

* Update README.md

* tables made till system interface

Co-authored-by: anubhavamd <92926185+anubhavamd@users.noreply.github.com>
2022-02-09 21:07:33 -08:00
Roopa Malavally
94f4488904 Delete ROCm_SMI_Manual_4.5.pdf 2022-02-09 17:53:09 -08:00
Roopa Malavally
afc1a33ad7 Delete ROCDebugger_User_Guide.pdf 2022-02-09 17:52:22 -08:00
Roopa Malavally
9b6fb663c9 Delete ROCDebugger_API_Guide.pdf 2022-02-09 17:52:12 -08:00
Roopa Malavally
7d78a111b4 Delete RDC_API_Manual_4.5.pdf 2022-02-09 17:51:58 -08:00
Roopa Malavally
f04316efdb Delete AMD_ROCm_DataCenter_Tool_User_Guide_v4.5.pdf 2022-02-09 17:51:47 -08:00
Roopa Malavally
0083f955a7 Delete AMD_HIP_Supported_CUDA_API_Reference_Guide.pdf 2022-02-09 17:51:35 -08:00
Roopa Malavally
237e662486 Delete AMD_HIP_Programming_Guide.pdf 2022-02-09 17:51:27 -08:00
Roopa Malavally
475711bb7d Delete AMD_Compiler_Reference_Guide_v4.5.pdf 2022-02-09 17:51:17 -08:00
Roopa Malavally
dc2b00f43d Delete AMD-HIP-API-4.5.pdf 2022-02-09 17:51:09 -08:00
Roopa Malavally
c0cd1b72ce Delete AMD Instinct™High Performance Computing and Tuning Guide.pdf 2022-02-09 17:51:02 -08:00
zhozha
95493f625c Update default.xml for ROCm 4.5.2 release 2021-12-10 16:40:11 -08:00
Roopa Malavally
c3f91afb26 Update README.md 2021-12-10 16:29:05 -08:00
Roopa Malavally
d827b836b2 Update README.md 2021-12-10 16:24:06 -08:00
Roopa Malavally
99d5fb03e0 Add files via upload 2021-12-10 16:23:34 -08:00
Roopa Malavally
1f6c308006 Delete AMD_HIP_Programming_Guide.pdf 2021-12-10 16:23:21 -08:00
Roopa Malavally
bb3aa02a86 Update README.md 2021-12-10 15:47:14 -08:00
Roopa Malavally
9b82c422d0 Update README.md 2021-11-23 11:08:04 -08:00
Roopa Malavally
8eed074e8a Update README.md 2021-11-23 11:07:12 -08:00
Roopa Malavally
53db303dd3 Update README.md 2021-11-23 10:54:49 -08:00
Roopa Malavally
36ec27d9a4 Update README.md 2021-11-23 10:52:59 -08:00
Roopa Malavally
d78bb0121b Update README.md 2021-11-23 10:43:10 -08:00
Roopa Malavally
f72c130e06 Update README.md 2021-11-17 07:05:28 -08:00
zhang2amd
c058e7a1c9 Added hipamd and MIOpenTensile to manifest. 2021-11-09 07:57:31 -08:00
zhang2amd
0d12925fe9 Merge pull request #1602 from RadeonOpenCompute/zhang2amd-patch-1
Update manifest for ROCm 4.5 release.
2021-10-27 20:36:31 -07:00
zhang2amd
f088317e44 Update manifest for ROCm 4.5 release. 2021-10-27 20:35:20 -07:00
Roopa Malavally
ca8f60e96f Update README.md 2021-10-27 19:26:36 -07:00
Roopa Malavally
ba8c56abdc Update README.md 2021-10-27 19:24:34 -07:00
Roopa Malavally
18410afcd7 Add files via upload 2021-10-27 07:24:03 -07:00
Roopa Malavally
c637c2a964 Add files via upload 2021-10-26 19:00:15 -07:00
Roopa Malavally
5a56a31fac Delete AMD_HIP_Supported_CUDA_API_Reference_Guide.pdf 2021-10-26 18:59:59 -07:00
Roopa Malavally
82b35be1ee Add files via upload 2021-10-26 18:21:01 -07:00
Roopa Malavally
03fb0f863c Delete HIP-API-4.5.pdf 2021-10-26 17:28:58 -07:00
Roopa Malavally
c730ade1e3 Add files via upload 2021-10-26 17:28:44 -07:00
Roopa Malavally
164a386ed6 Add files via upload 2021-10-26 17:27:34 -07:00
Roopa Malavally
db517138f6 Add files via upload 2021-10-26 17:17:19 -07:00
Roopa Malavally
bc63e35725 Add files via upload 2021-10-26 16:49:33 -07:00
Roopa Malavally
c9a8556171 Add files via upload 2021-10-26 16:34:10 -07:00
Roopa Malavally
91f193a510 Delete AMD_ROCm_SMI_Guide_v4.3.pdf 2021-10-26 15:46:18 -07:00
Roopa Malavally
b2fac149b5 Delete AMD_ROCm_Release_Notes_v4.3.pdf 2021-10-26 15:46:04 -07:00
Roopa Malavally
1d23bb0ec6 Delete AMD_ROCm_Release_Notes_v4.3.1.pdf 2021-10-26 15:45:49 -07:00
Roopa Malavally
fedfa50634 Delete AMD_ROCm_DataCenter_Tool_User_Guide_v4.3.pdf 2021-10-26 15:45:37 -07:00
Roopa Malavally
51ea894667 Delete AMD_ROCDebugger_User_Guide.pdf 2021-10-26 15:45:26 -07:00
Roopa Malavally
63b0e6d273 Delete AMD_ROCDebugger_API.pdf 2021-10-26 15:45:14 -07:00
Roopa Malavally
f1383c5d16 Delete AMD_RDC_API_Guide_v4.3.pdf 2021-10-26 15:45:02 -07:00
Roopa Malavally
f3ec7b4720 Delete AMD_HIP_Supported_CUDA_API_Reference_Guide_v4.3.pdf 2021-10-26 15:44:52 -07:00
Roopa Malavally
9492fc9b0d Delete AMD_HIP_Programming_Guide_v4.3.pdf 2021-10-26 15:44:37 -07:00
Roopa Malavally
c103fe233f Delete AMD_HIP_API_Guide_v4.3.pdf 2021-10-26 15:44:23 -07:00
Roopa Malavally
63c16a229e Delete AMD_Compiler_Reference_Guide_v4.3.pdf 2021-10-26 15:44:08 -07:00
Paul Menzel
18aa89804f README: Replace screenshots of tables with Markdown table (#1593)
The screenshots are from tables with text, which are not easily searchable,
are bigger in size than needed – increasing load times – and are in a
resolution, causing them to be blurry on HiDPI displays.  Therefore, use a
Markdown table instead solving all the issues above, and delete the images
from the repository.

The SLES service pack version differs in the two screenshots: SP2 vs SP3.
Go for *SP3*.

Resolves: https://github.com/RadeonOpenCompute/ROCm/issues/1591
2021-10-15 06:27:44 -07:00
Roopa Malavally
65a4524834 Update README.md 2021-09-18 12:36:44 -07:00
Roopa Malavally
b04ab30e81 Delete AMD_ROCm_v2.10_Release_Notes.pdf 2021-09-15 19:40:10 -07:00
Roopa Malavally
4c8787087a Update README.md 2021-08-27 15:37:37 -07:00
Roopa Malavally
7cd85779c4 Update README.md 2021-08-27 15:31:42 -07:00
Aakash Sudhanwa
c676ff480e Update default.xml (#1567) 2021-08-27 15:26:48 -07:00
Roopa Malavally
6d19f5b6c1 Add files via upload 2021-08-27 15:24:56 -07:00
Roopa Malavally
4679e8ac87 Update README.md 2021-08-27 15:24:20 -07:00
Roopa Malavally
8a3209f985 Update README.md 2021-08-27 15:23:58 -07:00
Roopa Malavally
79d0d00b2a Update README.md 2021-08-27 15:23:18 -07:00
Roopa Malavally
db5121cdfe Update README.md 2021-08-27 15:22:30 -07:00
Aakash Sudhanwa
035f4995bb Merge branch 'master' into master 2021-08-27 15:08:41 -07:00
Roopa Malavally
f63e3f9ce1 Add files via upload 2021-08-27 15:02:49 -07:00
Roopa Malavally
4e56ed7dc3 Update README.md 2021-08-13 11:49:38 -07:00
Roopa Malavally
2faf5b6ab7 Update README.md 2021-08-13 11:48:18 -07:00
Roopa Malavally
e69b7e6f71 Delete OSKernel.PNG 2021-08-13 11:48:00 -07:00
Roopa Malavally
d53ffd1c89 Add files via upload 2021-08-13 11:47:48 -07:00
Roopa Malavally
e177599de1 Add files via upload 2021-08-09 12:55:19 -07:00
Roopa Malavally
9fc1ba3970 Add files via upload 2021-08-09 12:47:17 -07:00
Nick Curtis
520764faa3 Fix missing links in rocprof docs (#1550) 2021-08-07 08:42:25 -07:00
Roopa Malavally
7d0b53c87f Add files via upload 2021-08-03 10:53:16 -07:00
Roopa Malavally
c3a8ecd0c5 Delete AMD_Compiler_Reference_Guide_v4.3.pdf 2021-08-03 10:49:28 -07:00
Roopa Malavally
21cf37b2df Add files via upload 2021-08-02 21:37:19 -07:00
Roopa Malavally
f4419a3d1c Delete AMD_HIP_Programming_Guide_v4.3.pdf 2021-08-02 21:37:00 -07:00
zhozha
5ffdcf84ab Update to ROCm 4.3 manifest 2021-08-02 17:33:25 -07:00
Roopa Malavally
085295daea Update README.md 2021-08-02 16:51:39 -07:00
Roopa Malavally
cf5cec2580 ROCm v4.3 Release Notes (#1540)
* Delete AMD HIP Programming Guide_v4.2.pdf

* Delete AMD_HIP_API_Guide_4.2.pdf

* Delete AMD_ROCm_DataCenter_Tool_User_Guide_v4.2.pdf

* Delete AMD_ROCm_Release_Notes_v4.2.pdf

* Delete HIP_Supported_CUDA_API_Reference_Guide_v4.2.pdf

* Delete ROCm_Data_Center_Tool_API_Guide_v4.2.pdf

* Delete ROCm_Debugger_API_Guide_v4.2.pdf

* Delete ROCm_Debugger_User_Guide_v4.2.pdf

* Delete ROCm_SMI_Manual_4.2.pdf

* Update README.md

* Update README.md

* Delete CG1.PNG

* Delete CG2.PNG

* Delete CG3.PNG

* Delete CGMain.PNG

* Delete CLI1.PNG

* Delete CLI2.PNG

* Delete SMI.PNG

* Delete keyfeatures.PNG

* Delete latestGPU.PNG

* Delete rocsolverAPI.PNG

* Create test.rst

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md
2021-08-02 16:39:54 -07:00
Roopa Malavally
e7a93ae3f5 Add files via upload 2021-08-01 18:53:14 -07:00
Roopa Malavally
e3b7d2f39d Delete AMD_ROCDebugger_API.pdf.pdf 2021-08-01 18:52:58 -07:00
Roopa Malavally
0c4565d913 Delete AMD_ROCDebugger_User_Guide.pdf.pdf 2021-08-01 18:52:30 -07:00
Roopa Malavally
313a589132 Add files via upload 2021-08-01 18:52:03 -07:00
Roopa Malavally
1caf5514e8 Add files via upload 2021-08-01 18:33:33 -07:00
Roopa Malavally
d029ad24cf Add files via upload 2021-08-01 18:09:17 -07:00
Roopa Malavally
ca6638d917 Add files via upload 2021-08-01 17:42:39 -07:00
Roopa Malavally
5cba920022 Add files via upload 2021-08-01 16:21:37 -07:00
Roopa Malavally
cefc8ef1d7 Add files via upload 2021-08-01 16:17:54 -07:00
Roopa Malavally
b71c5705a2 Delete ROCm_SMI_Manual_4.2.pdf 2021-08-01 16:13:32 -07:00
Roopa Malavally
977a1d14cd Delete ROCm_Debugger_User_Guide_v4.2.pdf 2021-08-01 16:13:17 -07:00
Roopa Malavally
3ab60d1326 Delete ROCm_Debugger_API_Guide_v4.2.pdf 2021-08-01 16:13:04 -07:00
Roopa Malavally
4b5b13294e Delete ROCm_Data_Center_Tool_API_Guide_v4.2.pdf 2021-08-01 16:12:50 -07:00
Roopa Malavally
ce66b14d9e Delete HIP_Supported_CUDA_API_Reference_Guide_v4.2.pdf 2021-08-01 16:12:32 -07:00
Roopa Malavally
01f63f546f Delete AMD_ROCm_Release_Notes_v4.2.pdf 2021-08-01 16:12:20 -07:00
Roopa Malavally
72eab2779e Delete AMD_ROCm_DataCenter_Tool_User_Guide_v4.2.pdf 2021-08-01 16:12:05 -07:00
Roopa Malavally
8a366db3d7 Delete AMD_HIP_API_Guide_4.2.pdf 2021-08-01 16:11:50 -07:00
Roopa Malavally
8267a84345 Delete AMD HIP Programming Guide_v4.2.pdf 2021-08-01 16:11:30 -07:00
zhang2amd
f7b3a38d49 Merge pull request #1470 from RadeonOpenCompute/roc-4.2.x
4.2 : Manifest Files
2021-05-11 14:58:43 -07:00
Lad, Aditya
12e3bb376b 4.2 : Manifest Files 2021-05-11 14:37:52 -07:00
Roopa Malavally
a44e82f263 ROCm v4.2 Release Notes (#1469)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md
2021-05-11 14:15:34 -07:00
Roopa Malavally
9af988ffc8 Add files via upload 2021-05-11 14:09:17 -07:00
Roopa Malavally
5fed386cf1 Delete AMD_ROCm_Release_Notes_v4.1.pdf 2021-05-11 14:08:41 -07:00
Roopa Malavally
d729428302 Add files via upload 2021-05-10 17:47:35 -07:00
Roopa Malavally
8611c5f450 Delete ROCm_SMI_API_GUIDE_v4.1.pdf 2021-05-10 17:47:20 -07:00
Roopa Malavally
ae0b56d029 Delete AMD_HIP_Programming_Guide_v4.1.pdf 2021-05-10 17:28:58 -07:00
Roopa Malavally
3862c69b09 Add files via upload 2021-05-10 16:30:37 -07:00
Roopa Malavally
be34f32307 Add files via upload 2021-05-10 15:18:46 -07:00
Roopa Malavally
08c9cce749 Add files via upload 2021-05-10 15:18:03 -07:00
Roopa Malavally
a83a7c9206 Delete Debugging with ROCGDB User Guide v4.1.pdf 2021-05-10 15:16:45 -07:00
Roopa Malavally
71faa9c81f Delete AMD-Debugger API Guide v4.1.pdf 2021-05-10 15:16:35 -07:00
Roopa Malavally
6b021edb23 Add files via upload 2021-05-10 13:37:23 -07:00
Roopa Malavally
3936d236e6 Delete AMD_ROCm_DataCenter_Tool_User_Guide_v4.1.pdf 2021-05-10 13:37:12 -07:00
Roopa Malavally
dbcb26756d Add files via upload 2021-05-10 13:13:55 -07:00
Roopa Malavally
96de448de6 Delete HIP_Supported_CUDA_API_Reference_Guide_v4.1.pdf 2021-05-10 13:13:39 -07:00
Roopa Malavally
ee0bc562e6 Add files via upload 2021-05-10 12:01:17 -07:00
Roopa Malavally
376b8673b7 Delete ROCm_Data_Center_Tool_API_Manual_4.1.pdf 2021-05-10 12:00:11 -07:00
Roopa Malavally
e9147a9103 Add files via upload 2021-05-10 11:58:50 -07:00
Roopa Malavally
fab1a697f0 Delete AMD_HIP_API_Guide_4.2.pdf.pdf 2021-05-10 11:58:28 -07:00
Roopa Malavally
a369e642b8 Delete AMD_HIP_API_Guide_v4.1.pdf 2021-05-10 11:58:16 -07:00
Roopa Malavally
9101972654 Add files via upload 2021-05-10 11:57:52 -07:00
Roopa Malavally
f3ba8df53d Update README.md 2021-04-21 08:28:44 -07:00
Roopa Malavally
ba7a87a2dc Update README.md 2021-04-19 13:43:39 -07:00
zhang2amd
df6d746d50 Merge pull request #1443 from RadeonOpenCompute/roc-4.1.1
ROCm 4.1.1 default.xml
2021-04-08 10:06:17 -07:00
Lad, Aditya
2b2bab5bf3 ROCm 4.1.1 default.xml 2021-04-08 09:59:11 -07:00
Roopa Malavally
5ec9b12f99 Update README.md 2021-04-08 09:27:07 -07:00
Roopa Malavally
803148affd Update README.md 2021-04-08 09:21:27 -07:00
Roopa Malavally
9275fb6298 Update README.md 2021-04-08 09:19:52 -07:00
Roopa Malavally
b6ae3f145e Update README.md 2021-04-07 11:06:04 -07:00
Roopa Malavally
f80eefc965 Update README.md 2021-04-07 11:04:51 -07:00
Roopa Malavally
c5d91843a7 Update README.md 2021-04-07 11:03:31 -07:00
Roopa Malavally
733a9c097c Update README.md 2021-04-07 07:15:49 -07:00
Roopa Malavally
ff2b3f8a23 Add files via upload 2021-03-26 12:14:59 -07:00
Roopa Malavally
5a4cf1cee1 Delete AMD_ROCm_Release_Notes_v4.1.docx 2021-03-26 12:14:46 -07:00
Roopa Malavally
dccf5ca356 Update README.md 2021-03-26 12:01:54 -07:00
Roopa Malavally
8b20bd56a6 Update README.md 2021-03-26 10:00:07 -07:00
zhang2amd
65cb10e5e8 Merge pull request #1427 from xuhuisheng/patch-1
add hipFFT to default.xml
2021-03-25 23:03:26 -07:00
Roopa Malavally
ac2625dd26 Delete AMD_ROCm_Release_Notes_v4.1.pdf 2021-03-25 15:55:22 -07:00
Roopa Malavally
3716310e93 Add files via upload 2021-03-25 15:55:04 -07:00
Roopa Malavally
2dee17f7d6 Add files via upload 2021-03-25 13:03:33 -07:00
Roopa Malavally
61e8b0d70e Delete AMD_ROCm_Release_Notes_v4.1.pdf 2021-03-25 13:03:20 -07:00
Roopa Malavally
8a3304a8d9 Update README.md 2021-03-25 11:45:08 -07:00
Roopa Malavally
55488a9424 Update README.md 2021-03-25 11:03:19 -07:00
Roopa Malavally
ff4a1d4059 Update README.md 2021-03-25 10:03:46 -07:00
Xu Huisheng
4b2d93fb7e add hipFFT to default.xml
There is hipFFT on <http://repo.radeon.com/rocm/apt/4.1/pool/main/h/hipfft/>.
Please add related repository in default.xml.
Thank you.
2021-03-25 19:41:05 +08:00
Roopa Malavally
061ccd21b8 Update README.md 2021-03-24 10:26:07 -07:00
Roopa Malavally
0ed1bd9f8e Add files via upload 2021-03-24 10:25:24 -07:00
Roopa Malavally
856c74de55 Update README.md 2021-03-24 07:59:03 -07:00
Roopa Malavally
12c6f60e45 Update README.md 2021-03-24 07:58:30 -07:00
Aditya Lad
897b1e8e2d Merge pull request #1422 from RadeonOpenCompute/roc-4.1.x
Roc 4.1.x
2021-03-23 17:59:19 -07:00
Lad, Aditya
382ea7553f Remove inaccessible repos 2021-03-23 17:56:10 -07:00
Aditya Lad
2014b47dcb Merge pull request #1420 from RadeonOpenCompute/master
Addition of ROCm release notes
2021-03-23 17:29:17 -07:00
zhang2amd
b9f9bafd9b Merge pull request #1419 from RadeonOpenCompute/roc-4.1.x
ROCm 4.1 Release
2021-03-23 17:17:00 -07:00
Lad, Aditya
ff15f420c6 ROCm 4.1 default.xml edit 2021-03-23 17:10:44 -07:00
Lad, Aditya
f51c9be952 Release ROCm 4.1 Readme.md and default.xml 2021-03-23 17:03:00 -07:00
Lad, Aditya
64e254dc99 Release ROCm 4.1 Readme.md and default.xml 2021-03-23 17:01:33 -07:00
Roopa Malavally
af7f921474 Add files via upload 2021-03-23 17:00:17 -07:00
Roopa Malavally
8b3377749f Add files via upload 2021-03-23 14:16:46 -07:00
Roopa Malavally
c3a3ce55d1 Delete gdb.pdf 2021-03-22 17:29:33 -07:00
Roopa Malavally
64c727449b Delete amd-dbgapi.pdf 2021-03-22 17:29:24 -07:00
Roopa Malavally
182dfc65cf Add files via upload 2021-03-22 16:43:36 -07:00
Roopa Malavally
d529d5c585 Delete AMD_ROCm_Release_Notes_v4.0.pdf 2021-03-22 16:29:11 -07:00
Roopa Malavally
cca6bc4921 Delete HIP_Programming_Guide_v4.0.pdf 2021-03-22 16:28:56 -07:00
Roopa Malavally
e3dbbb6bbf Add files via upload 2021-03-22 16:27:41 -07:00
Roopa Malavally
6e39c80762 Add files via upload 2021-03-22 16:17:38 -07:00
Roopa Malavally
f96f5df625 Add files via upload 2021-03-22 16:07:44 -07:00
Roopa Malavally
0639a312c8 Delete ROCm_Data_Center_Too_API_Manual_4.1.pdf 2021-03-22 16:07:03 -07:00
Roopa Malavally
a2878b1460 Add files via upload 2021-03-22 15:38:16 -07:00
Roopa Malavally
1daf261d25 Delete ROCm_SMI_API_Guide_v4.0.pdf 2021-03-22 15:37:54 -07:00
Roopa Malavally
5848bc3d7e Add files via upload 2021-03-22 15:37:15 -07:00
Roopa Malavally
d9692359ad Delete HIP-API_Guide_v4.0.pdf 2021-03-22 15:36:42 -07:00
Roopa Malavally
25110784cf Add files via upload 2021-03-22 14:53:33 -07:00
Roopa Malavally
9ff31d316f Update README.md 2021-03-10 07:53:11 -08:00
Roopa Malavally
b072119ad6 Update README.md 2021-03-09 09:03:05 -08:00
Roopa Malavally
095544032c Update README.md 2021-02-25 07:28:52 -08:00
Roopa Malavally
26a39a637a Update README.md 2021-02-25 07:24:46 -08:00
Roopa Malavally
6fb55e6f45 Update README.md 2021-02-24 13:16:33 -08:00
Lad, Aditya
290091946f ROCm 4.0.1 Manifest file 2021-01-25 15:11:55 -08:00
Roopa Malavally
2874a8ae6c Update README.md 2021-01-25 15:02:27 -08:00
Roopa Malavally
f62f2b24da Add files via upload 2021-01-20 18:10:40 -08:00
Roopa Malavally
790567e3bd Update README.md 2020-12-18 15:08:54 -08:00
Roopa Malavally
57d7a202d4 Update README.md 2020-12-18 15:08:24 -08:00
Aditya Lad
80d2aa739b Merge pull request #1343 from RadeonOpenCompute/roc-4.0.x
ROCm 4.0 Release
2020-12-18 14:30:27 -08:00
Roopa Malavally
b18851f804 Update README.md 2020-12-18 13:12:20 -08:00
Roopa Malavally
0f0dbf0c92 Update README.md 2020-12-18 13:11:59 -08:00
Lad, Aditya
224a45379f ROCm 4.0 Release 2020-12-18 12:53:33 -08:00
Roopa Malavally
f521943747 Update README.md 2020-12-18 12:52:04 -08:00
Roopa Malavally
2b7f806b10 AMD ROCm Release Notes v4.0 (#1342)
* Update README.md

* Update README.md

* Add files via upload

* Delete AMD_ROCm_Release_Notes_v3.10.pdf

* Delete AMD_ROCm_DataCenter_Tool_User_Guide.pdf

* Delete ROCm_Data_Center_API_Guide.pdf

* Delete ROCm_SMI_API_Guide_v3.10.pdf

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-12-18 12:46:40 -08:00
Roopa Malavally
cd55ef67c9 Add files via upload 2020-12-18 12:32:43 -08:00
Roopa Malavally
9320669eee Delete AMD_ROCm_Release_Notes_v3.10.pdf 2020-12-18 08:19:51 -08:00
Roopa Malavally
c1211c66e3 Delete ROCm_SMI_API_Guide_v3.10.pdf 2020-12-18 08:19:36 -08:00
Roopa Malavally
c8fcff6488 Delete ROCm_Data_Center_API_Guide.pdf 2020-12-18 08:19:18 -08:00
Roopa Malavally
7118076ab4 Delete AMD_ROCm_DataCenter_Tool_User_Guide.pdf 2020-12-18 08:18:58 -08:00
Roopa Malavally
ec5523395a Add files via upload 2020-12-17 21:00:59 -08:00
Roopa Malavally
41d8f6a235 Add files via upload 2020-12-17 14:00:59 -08:00
Roopa Malavally
c69eef858a Update README.md 2020-12-10 13:38:07 -08:00
Aditya Lad
5b902ca38c Merge pull request #1316 from RadeonOpenCompute/roc-3.10.x
add rdc and half
2020-12-02 16:11:11 -08:00
Aditya Lad
68c5c198df add rdc and half 2020-12-02 16:07:15 -08:00
Aditya Lad
761ed4e70f Merge pull request #1314 from RadeonOpenCompute/roc-3.10.x
3.10 : Manifest Files
2020-12-01 16:31:55 -08:00
Lad, Aditya
8d5a160f0a 3.10 : Manifest Files 2020-12-01 16:24:12 -08:00
Roopa Malavally
f61c2ad155 Add files via upload 2020-12-01 15:45:33 -08:00
Roopa Malavally
3e2e30cc9a Delete AMD_ROCm_DataCenter_Tool_User_Guide.pdf 2020-12-01 15:44:56 -08:00
Roopa Malavally
a1f3b4e6b8 Update README.md 2020-12-01 15:08:53 -08:00
Roopa Malavally
7a3a012e6a Update README.md 2020-11-30 15:45:42 -08:00
Roopa Malavally
5b6ab31db3 Update README.md 2020-11-30 14:12:01 -08:00
Roopa Malavally
acabe2c532 Update README.md 2020-11-30 14:10:06 -08:00
Roopa Malavally
39d8bcd504 Release notes for v3.10 (#1312)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Delete matrix.png

* Delete ROCMCLI3.PNG

* Delete ROCMCLI2.PNG

* Delete ROCMCLI1.PNG

* Delete GEMM2.PNG

* Add files via upload

* Delete ROCm_SMI_Manual_v3.9.pdf

* Delete AMD_ROCm_Release_Notes_v3.9.pdf

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-11-30 14:07:52 -08:00
Roopa Malavally
af6d1e9b26 Add files via upload 2020-11-30 14:01:36 -08:00
Roopa Malavally
1fa1d4a935 Add files via upload 2020-11-30 09:53:49 -08:00
Roopa Malavally
03d93c1948 Delete AMD_ROCm_Release_Notes_v3.9.pdf 2020-11-30 08:55:35 -08:00
Roopa Malavally
93984b0956 Add files via upload 2020-11-30 08:54:52 -08:00
Roopa Malavally
6ccb1cfc0f Add files via upload 2020-11-30 07:29:29 -08:00
Roopa Malavally
f054f82173 Delete ROCm_SMI_Manual_v3.9.pdf 2020-11-30 07:28:11 -08:00
Xu Huisheng
bb6756b58d remove dumplicated remote=roc-github (#1248) 2020-11-18 08:19:23 -08:00
Roopa Malavally
d957b8a17c Update README.md 2020-11-12 13:47:48 -08:00
Roopa Malavally
37ece61861 Update README.md 2020-11-11 14:16:48 -08:00
Roopa Malavally
434023f31b Update README.md 2020-11-03 07:45:53 -08:00
Aditya Lad
a555260687 Merge pull request #1268 from RadeonOpenCompute/roc-3.9.x
Roc 3.9.x
2020-10-28 17:39:17 -07:00
Lad, Aditya
bf89c6bbf1 3.9 documentation 2020-10-28 15:32:49 -07:00
Lad, Aditya
bd4b772255 ROCm 3.9 default.xml 2020-10-28 15:22:02 -07:00
Lad, Aditya
e99027c39c ROCm 3.9 : Manifest files 2020-10-28 15:14:41 -07:00
Roopa Malavally
93c69afb5b Add files via upload 2020-10-28 14:54:54 -07:00
Roopa Malavally
bc2ce5c35b Delete staticlinkinglib.PNG 2020-10-28 14:52:02 -07:00
Roopa Malavally
bf633aec6b Delete forweb.PNG 2020-10-28 14:51:49 -07:00
Roopa Malavally
8608a9a1c9 Delete RDCComponentsrevised.png 2020-10-28 14:51:33 -07:00
Roopa Malavally
76afb05b6c Delete AMD_ROCm_DataCenter_Tool_User_Guide.pdf 2020-10-28 14:51:19 -07:00
Roopa Malavally
8bc67a21ea Update README.md 2020-10-19 20:23:07 -07:00
Roopa Malavally
1ce148edb1 Update README.md 2020-10-19 20:21:08 -07:00
Roopa Malavally
cc6147c25b Update README.md 2020-10-19 20:20:20 -07:00
Roopa Malavally
aadd9e68e1 Update README.md 2020-10-19 20:17:34 -07:00
Roopa Malavally
dce5aee2dc Add files via upload 2020-10-19 19:34:27 -07:00
Aditya Lad
0bcae510a3 Merge pull request #1244 from RadeonOpenCompute/roc-3.8.x
Remove MiGraphX from 3.8
2020-09-25 10:06:57 -07:00
Roopa Malavally
506cdcf6db Update README.md 2020-09-25 08:06:49 -07:00
Roopa Malavally
a919ba64c9 Update README.md 2020-09-25 08:00:10 -07:00
Roopa Malavally
fae25ccf9b Update README.md 2020-09-22 16:52:31 -07:00
Aakash Sudhanwa
67bd7501c1 Update README.md 2019-12-18 14:10:38 -08:00
Aakash Sudhanwa
d62f1c4247 Merge pull request #12 from RadeonOpenCompute/master
Rebase
2019-12-18 14:09:40 -08:00
Aakash Sudhanwa
c3d5bc6406 Rename Release nodes pdf 2019-11-25 20:54:25 -08:00
Aakash Sudhanwa
db45731729 Merge pull request #11 from RadeonOpenCompute/master
ROCm Release 2.10 (#947)
2019-11-25 20:12:36 -08:00
Aakash Sudhanwa
34552e95e0 Release Notes 2019-11-25 19:23:24 -08:00
Aakash Sudhanwa
8d0c516c5c Merge pull request #10 from RadeonOpenCompute/master
Update to 2.10
2019-11-25 19:20:50 -08:00
Aakash Sudhanwa
5cba919767 default.xml: ROCm Rel 2.10 2019-11-25 14:38:06 -08:00
Aakash Sudhanwa
bb0022e972 Merge pull request #9 from RadeonOpenCompute/master
Updating to latest
2019-11-25 13:04:27 -08:00
182 changed files with 22512 additions and 380 deletions

5
.github/CODEOWNERS vendored Executable file
View File

@@ -0,0 +1,5 @@
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus @LisaDelaney
# Documentation files
docs/* @ROCm/rocm-documentation
*.md @ROCm/rocm-documentation
*.rst @ROCm/rocm-documentation

13
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,13 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/docs/sphinx" # Location of package manifests
open-pull-requests-limit: 10
schedule:
interval: "daily"
versioning-strategy: increase

22
.github/workflows/issue_retrieval.yml vendored Normal file
View File

@@ -0,0 +1,22 @@
name: Issue retrieval
on:
issues:
types: [opened]
jobs:
auto-retrieve:
runs-on: ubuntu-latest
steps:
- name: Generate a token
id: generate_token
uses: actions/create-github-app-token@v1
with:
app_id: ${{ secrets.ACTION_APP_ID }}
private_key: ${{ secrets.ACTION_PEM }}
- name: 'Retrieve Issue'
uses: abhimeda/rocm_issue_management@main
with:
authentication-token: ${{ steps.generate_token.outputs.token }}
github-organization: 'ROCm'
project-num: '6'

20
.github/workflows/linting.yml vendored Normal file
View File

@@ -0,0 +1,20 @@
name: Linting
on:
push:
branches:
- develop
- main
- 'docs/*'
- 'roc**'
pull_request:
branches:
- develop
- main
- 'docs/*'
- 'roc**'
jobs:
call-workflow-passing-data:
name: Documentation
uses: RadeonOpenCompute/rocm-docs-core/.github/workflows/linting.yml@develop

19
.gitignore vendored Normal file
View File

@@ -0,0 +1,19 @@
.venv
.vscode
build
# documentation artifacts
_build/
_images/
_static/
_templates/
_toc.yml
docBin/
_doxygen/
_readthedocs/
# avoid duplicating contributing.md due to conf.py
docs/CHANGELOG.md
docs/contribute/index.md
docs/about/release-notes.md
docs/about/CHANGELOG.md

18
.markdownlint-cli2.yaml Normal file
View File

@@ -0,0 +1,18 @@
config:
default: true
MD004:
style: asterisk
MD013: false
MD026:
punctuation: '.,;:!'
MD029:
style: ordered
MD033: false
MD034: false
MD041: false
MD051: false
ignores:
- CHANGELOG.md
- docs/CHANGELOG.md
- "{,docs/}{RELEASE,release}.md"
- tools/autotag/templates/**/*.md

18
.readthedocs.yaml Normal file
View File

@@ -0,0 +1,18 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf]
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-20.04
tools:
python: "3.8"

584
.wordlist.txt Normal file
View File

@@ -0,0 +1,584 @@
ABI
activations
addr
AddressSanitizer
AlexNet
alloc
allocator
allocators
ALU
AMD
AMDGPU
amdgpu
AMDGPUs
AMDMIGraphX
AMI
AOCC
AOMP
api
APIC
APIs
Arb
ASan
ASIC
ASICs
ASm
ATI
atmi
atomics
autogenerated
avx
awk
backend
backends
benchmarking
bilinear
BitCode
BLAS
Blit
blit
BMC
buildable
bursty
bzip
cacheable
CCD
cd
CDNA
CentOS
centric
changelog
chiplet
CIFAR
CLI
CLion
CMake
cmake
CMakeLists
CMakePackage
cmd
coalescable
codename
Codespaces
comgr
Commitizen
CommonMark
completers
composable
concretization
Concretized
Conda
config
conformant
convolutional
convolves
CoRR
CP
CPC
CPF
CPP
CPU
CPUs
CSC
CSE
CSn
csn
CSV
CTests
CU
cuBLAS
CUDA
cuFFT
cuLIB
cuRAND
CUs
cuSOLVER
cuSPARSE
CXX
dataset
datasets
dataspace
datatype
datatypes
dbgapi
de
deallocation
denormalize
Dependabot
deserializers
detections
dev
DevCap
devicelibs
devsel
DGEMM
disambiguates
distro
DL
DMA
DNN
DNNL
Dockerfile
Doxygen
DPM
DRI
DW
DWORD
el
enablement
endpgm
env
epilog
EPYC
ESXi
ethernet
exascale
executables
ffmpeg
FFT
FFTs
FHS
filesystem
Filesystem
Flang
FMA
Fortran
fortran
FP
galb
gcc
GCD
GCDs
GCN
GDB
gdb
GDDR
GDR
GDS
GEMM
GEMMs
GenZ
gfortran
gfx
GIM
github
Gitpod
GL
GLXT
GMI
gnupg
GPG
GPR
GPU
GPUs
grayscale
GRBM
gzip
Haswell
HBM
HCA
heterogenous
hipamd
hipBLAS
hipblas
hipBLASLt
HIPCC
hipCUB
hipcub
HIPExtension
hipFFT
hipfft
hipfort
HIPIFY
hipify
hipLIB
hipRAND
hipSOLVER
hipsolver
hipSPARSE
hipsparse
hipSPARSELt
hipTensor
HPC
HPCG
HPE
HPL
HSA
hsa
hsakmt
HWE
ib_core
ICV
IDE
IDEs
ImageNet
IMDB
inband
incrementing
inferencing
InfiniBand
inflight
init
Inlines
inlining
installable
IntelliSense
interprocedural
Intersphinx
intra
invariants
invocating
Ioffe
IOMMU
IOP
IOPM
IOV
ipo
IRQ
ISA
ISV
ISVs
JSON
Jupyter
kdb
KFD
Khronos
KVM
LAPACK
LCLK
LDS
libfabric
libjpeg
libs
linearized
linter
linux
llvm
LLVM
localscratch
logits
lossy
LSAN
LTS
Makefile
Makefiles
matchers
Matplotlib
Mellanox's
MEM
MERCHANTABILITY
MFMA
microarchitecture
MIGraphX
migraphx
MIOpen
miopen
MIOpenGEMM
miopengemm
MIVisionX
mivisionx
mkdir
mlirmiopen
MMA
MMIO
MMIOH
MNIST
MPI
MSVC
mtypes
Multicore
Multithreaded
MVAPICH
mvffr
MyEnvironment
MyST
namespace
namespaces
Nano
Navi
NBIO
NBIOs
NIC
NICs
Noncoherently
NPS
NUMA
NumPy
numref
NVCC
NVPTX
OAM
OAMs
ocl
OCP
OEM
OFED
OMP
OMPT
OMPX
ONNX
OpenCL
opencl
opencv
OpenFabrics
OpenGL
OpenMP
openmp
openssl
OpenVX
optimizers
os
OSS
OSU
Pageable
pageable
passthrough
PCI
PCIe
PeerDirect
perfcounter
Perfetto
performant
perl
PIL
PILImage
PowerShell
PnP
pragma
pre
prebuilt
precompiled
prefetch
prefetchable
preprocess
preprocessing
preq
prequantized
prerequisites
PRNG
profiler
protobuf
PRs
pseudorandom
py
PyPi
PyTorch
Qcycles
quasirandom
queueing
Radeon
RadeonOpenCompute
RCCL
rccl
RDC
rdc
RDMA
RDNA
reformats
RelWithDebInfo
repos
Req
req
resampling
RST
reStructuredText
RHEL
Rickle
roadmap
roc
ROC
RoCE
rocAL
rocALUTION
rocalution
rocBLAS
rocblas
rocclr
ROCdbgapi
rocFFT
rocfft
ROCgdb
ROCk
rocLIB
rocm
ROCm
ROCmCC
rocminfo
rocMLIR
ROCmSoftwarePlatform
ROCmValidationSuite
rocPRIM
rocprim
rocprof
ROCProfiler
rocprofiler
ROCr
rocr
rocRAND
rocrand
rocSOLVER
rocsolver
rocSPARSE
rocsparse
roct
rocThrust
rocthrust
ROCTracer
roctracer
rocWMMA
RST
runtime
runtimes
RW
Ryzen
SALU
SBIOS
SCA
scalability
SDK
SDMA
SDRAM
SENDMSG
sendmsg
SENDMSG
sendmsg
SerDes
serializers
SGPR
SGPRs
SHA
shader
Shlens
sigmoid
SIGQUIT
SIMD
SIMDs
SKU
SKUs
skylake
sL
SLES
sm
SMEM
SMI
smi
SMT
softmax
Spack
spack
SPI
SQs
SRAM
SRAMECC
src
stochastically
strided
subdirectory
subexpression
subfolder
subfolders
supercomputing
Supermicro
SWE
Szegedy
tagram
TCA
TCC
TCI
TCIU
TCP
TCR
TensorBoard
TensorFlow
TFLOPS
tg
th
tmp
ToC
tokenize
toolchain
toolchains
toolset
toolsets
TorchAudio
TorchMIGraphX
TorchScript
TorchServe
TorchVision
torchvision
tracebacks
TransferBench
TrapStatus
txt
UAC
uarch
ubuntu
UC
UCC
UCX
UIF
Uncached
uncached
uncorrectable
Unhandled
uninstallation
unsqueeze
unstacking
unswitching
untrusted
untuned
USM
UTCL
UTIL
utils
VALU
Vanhoucke
VBIOS
vdi
vectorizable
vectorization
vectorize
vectorized
vectorizer
vectorizes
VGPR
VGPRs
vjxb
vL
VM
VMEM
VMWare
VRAM
VSIX
VSkipped
Vulkan
walkthrough
walkthroughs
wavefront
wavefronts
WGP
whitespaces
Wojna
workgroup
Workgroups
workgroups
writeback
Writebacks
writebacks
wrreq
WX
wzo
Xeon
XGMI
Xnack
XT
Xteam
XTX
xz
YAML
yaml
YML
YModel
ysvmadyb
ZenDNN
zypper

Binary file not shown.

7418
CHANGELOG.md Normal file

File diff suppressed because it is too large Load Diff

40
CMakeLists.txt Normal file
View File

@@ -0,0 +1,40 @@
# MIT License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
cmake_minimum_required(VERSION 3.18.0)
project(ROCm VERSION 5.7.1 LANGUAGES NONE)
option(BUILD_DOCS "Build ROCm documentation" ON)
include(GNUInstallDirs)
# Adding default path cmake modules
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules")
# Handle dependencies
include(Dependencies)
# Build docs
if(BUILD_DOCS)
add_subdirectory(docs)
endif()

94
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,94 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Contributing to ROCm">
<meta name="keywords" content="ROCm, contributing, contribute, maintainer, contributor">
</head>
# Contribute to ROCm
AMD values and encourages contributions to our code and documentation. If you want to contribute
to our ROCm repositories, first review the following guidance. For documentation-specific information,
see [Contributing to ROCm docs](https://rocm.docs.amd.com/en/latest/contribute/contribute-docs.html).
ROCm is a software stack made up of a collection of drivers, development tools, and APIs that enable
GPU programming from low-level kernel to end-user applications. Because some of our components
are inherited from external projects (such as
[LLVM](https://github.com/ROCm/llvm-project) and
[Kernel driver](https://github.com/ROCm/ROCK-Kernel-Driver)), these use
project-specific contribution guidelines and workflow. Refer to their repositories for more information.
All other ROCm components follow the workflow described in the following sections.
## Development workflow
ROCm uses GitHub to host code, collaborate, and manage version control. We use pull requests (PRs)
for all changes within our repositories. We use
[GitHub issues](https://github.com/ROCm/ROCm/issues) to track known issues, such as
bugs.
### Issue tracking
Before filing a new issue, search the
[existing issues](https://github.com/ROCm/ROCm/issues) to make sure your issue isn't
already listed.
General issue guidelines:
* Use your best judgement for issue creation. If your issue is already listed, upvote the issue and
comment or post to provide additional details, such as how you reproduced this issue.
* If you're not sure if your issue is the same, err on the side of caution and file your issue.
You can add a comment to include the issue number (and link) for the similar issue. If we evaluate
your issue as being the same as the existing issue, we'll close the duplicate.
* If your issue doesn't exist, use the issue template to file a new issue.
* When filing an issue, be sure to provide as much information as possible, including script output so
we can collect information about your configuration. This helps reduce the time required to
reproduce your issue.
* Check your issue regularly, as we may require additional information to successfully reproduce the
issue.
### Pull requests
When you create a pull request, you should target the default branch. Our repositories typically use the **develop** branch as the default integration branch.
When creating a PR, use the following process. Note that each repository may include additional,
project-specific steps. Refer to each repository's PR process for any additional steps.
* Identify the issue you want to fix
* Target the default branch (usually the **develop** branch) for integration
* Ensure your code builds successfully
* Each component has a suite of test cases to run; include the log of the successful test run in your PR
* Do not break existing test cases
* New functionality is only merged with new unit tests
* If your PR includes a new feature, you must provide an application or test so we can ensure that the
feature works and continues to be valid in the future
* Tests must have good code coverage
* Submit your PR and work with the reviewer or maintainer to get your PR approved
* Once approved, the PR is brought onto internal CI systems and may be merged into the component
during our release cycle, as coordinated by the maintainer
* We'll inform you once your change is committed
:::{important}
By creating a PR, you agree to allow your contribution to be licensed under the
terms of the LICENSE.txt file in the corresponding repository. Different repositories may use different
licenses.
:::
You can look up each license on the [ROCm licensing](https://rocm.docs.amd.com/en/latest/about/license.html) page.
### New feature development
Use the [GitHub Discussion forum](https://github.com/ROCm/ROCm/discussions)
(Ideas category) to propose new features. Our maintainers are happy to provide direction and
feedback on feature development.
### Documentation
Submit ROCm documentation changes to our
[documentation repository](https://github.com/ROCm/ROCm). You must update
documentation related to any new feature or API contribution.
Note that each ROCm project uses its own repository for documentation.
## Future development workflow
The current ROCm development workflow is GitHub-based. If, in the future, we change this platform,
the tools and links may change. In this instance, we will update contribution guidelines accordingly.

60
GOVERNANCE.md Normal file
View File

@@ -0,0 +1,60 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm governance model">
<meta name="keywords" content="ROCm, governance">
</head>
# Governance model
ROCm is a software stack made up of a collection of drivers, development tools, and APIs that enable
GPU programming from the low-level kernel to end-user applications.
Components of ROCm that are inherited from external projects (such as
[LLVM](https://github.com/ROCm/llvm-project) and
[Kernel driver](https://github.com/ROCm/ROCK-Kernel-Driver)) follow their own
governance model and code of conduct. All other components of ROCm are governed by this
document.
## Governance
ROCm is led and managed by AMD.
We welcome contributions from the community. Our maintainers review all proposed changes to
ROCm.
## Roles
* **Maintainers** are responsible for their designated component and repositories.
* **Contributors** provide input and suggest changes to existing components.
### Maintainers
Maintainers are appointed by AMD. They are able to approve changes and can commit to our
repositories. They must use pull requests (PRs) for all changes.
You can find the list of maintainers in the CODEOWNERS file of each repository. Code owners differ
between repositories.
### Contributors
If you're not a maintainer, you're a contributor. We encourage the ROCm community to contribute in
several ways:
* Help other community members by posting questions or solutions on our
[GitHub discussion forums](https://github.com/ROCm/ROCm/discussions)
* Notify us of a bugs by filing an issue report on
[GitHub Issues](https://github.com/ROCm/ROCm/issues)
* Improve our documentation by submitting a PR to our
[repository](https://github.com/ROCm/ROCm/)
* Improve the code base (for smaller or contained changes) by submitting a PR to the component
* Suggest larger features by adding to the *Ideas* category in the
[GitHub discussion forum](https://github.com/ROCm/ROCm/discussions)
For more information, refer to our [contribution guidelines](CONTRIBUTING.md).
## Code of conduct
To engage with any AMD ROCm component that is hosted on GitHub, you must abide by the
[GitHub community guidelines](https://docs.github.com/en/site-policy/github-terms/github-community-guidelines)
and the
[GitHub community code of conduct](https://docs.github.com/en/site-policy/github-terms/github-community-code-of-conduct).

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 65 KiB

351
README.md
View File

@@ -1,333 +1,58 @@
# AMD ROCm Software
# AMD ROCm Release Notes v3.8.0
ROCm is an open-source stack, composed primarily of open-source software, designed for graphics
processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and
APIs that enable GPU programming from low-level kernel to end-user applications.
This page describes the features, fixed issues, and information about downloading and installing the ROCm software.
It also covers known issues in this release.
With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
collaborate, test, and deploy your applications in a free, open source, integrated, and secure software
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
- [Supported Operating Systems and Documentation Updates](#Supported-Operating-Systems-and-Documentation-Updates)
* [Supported Operating Systems](#Supported-Operating-Systems)
* [AMD ROCm Documentation Updates](#AMD-ROCm-Documentation-Updates)
- [What\'s New in This Release](#Whats-New-in-This-Release)
* [Hipfort-Interface for GPU Kernel Libraries](#Hipfort-Interface-for-GPU-Kernel-Libraries)
* [ROCm Data Center Tool](#ROCm-Data-Center-Tool)
* [Error-Correcting Code Fields in ROCm Data Center Tool](#Error-Correcting-Code-Fields-in-ROCm-Data-Center-Tool)
* [Static Linking Libraries](#Static-Linking-Libraries)
- [Fixed Defects](#Fixed-Defects)
ROCm is powered by AMDs
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
an open-source software C++ GPU programming environment and its corresponding runtime. HIP
allows ROCm developers to create portable applications on different platforms by deploying code on a
range of platforms, from dedicated gaming GPUs to exascale HPC clusters.
- [Known Issues](#Known-Issues)
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open
source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
(ML) frameworks, such as PyTorch and TensorFlow.
- [Deploying ROCm](#Deploying-ROCm)
- [Hardware and Software Support](#Hardware-and-Software-Support)
## ROCm documentation
- [Machine Learning and High Performance Computing Software Stack for AMD GPU](#Machine-Learning-and-High-Performance-Computing-Software-Stack-for-AMD-GPU)
* [ROCm Binary Package Structure](#ROCm-Binary-Package-Structure)
* [ROCm Platform Packages](#ROCm-Platform-Packages)
This repository contains the manifest file for ROCm releases, changelogs, and release information.
The `default.xml` file contains information for all repositories and the associated commit used to build
the current ROCm release; `default.xml` uses the Manifest Format repository.
# Supported Operating Systems
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
`develop` branch of our repositories contains content for the next ROCm release.
## Support for Vega 7nm Workstation
The ROCm documentation homepage is [rocm.docs.amd.com](https://rocm.docs.amd.com).
This release extends support to the Vega 7nm Workstation (Vega20 GL-XE) version.
### Building our documentation
## List of Supported Operating Systems
For a quick-start build, use the following code. For more options and detail, refer to
[Building documentation](./docs/contribute/building.md).
The AMD ROCm platform is designed to support the following operating systems:
```bash
cd docs
* Ubuntu 20.04 (5.4 and 5.6-oem) and 18.04.5 (Kernel 5.4)
* CentOS 7.8 & RHEL 7.8 (Kernel 3.10.0-1127) (Using devtoolset-7 runtime support)
* CentOS 8.2 & RHEL 8.2 (Kernel 4.18.0 ) (devtoolset is not required)
* SLES 15 SP1
pip3 install -r sphinx/requirements.txt
## Fresh Installation of AMD ROCm v3.8 Recommended
A fresh and clean installation of AMD ROCm v3.8 is recommended. An upgrade from previous releases to AMD ROCm v3.8 is not supported.
For more information, refer to the AMD ROCm Installation Guide at:
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
**Note**: AMD ROCm release v3.3 or prior releases are not fully compatible with AMD ROCm v3.5 and higher versions. You must perform a fresh ROCm installation if you want to upgrade from AMD ROCm v3.3 or older to 3.5 or higher versions and vice-versa.
**Note**: *render group* is required only for Ubuntu v20.04. For all other ROCm supported operating systems, continue to use *video group*.
* For ROCm v3.5 and releases thereafter,the *clinfo* path is changed to - */opt/rocm/opencl/bin/clinfo*.
* For ROCm v3.3 and older releases, the *clinfo* path remains unchanged - */opt/rocm/opencl/bin/x86_64/clinfo*.
# AMD ROCm Documentation Updates
## AMD ROCm Installation Guide
The AMD ROCm Installation Guide in this release includes:
* Updated Supported Environments
* HIP Installation Instructions
* Tensorflow ROCm Port: Basic Installations on RHEL v8.2
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
## AMD ROCm - HIP Documentation Updates
* HIP Repository Information
For more information, see
https://rocmdocs.amd.com/en/latest/Programming_Guides/Programming-Guides.html#hip-repository-information
## ROCm Data Center Tool User Guide
* Error-Correction Codes Field and Output Documentation
For more information, refer to the AMD ROCm Data Center User Guide at
https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_DataCenter_Tool_User_Guide.pdf
## General AMD ROCm Documentation Links
Access the following links for more information:
* For AMD ROCm documentation, see
https://rocmdocs.amd.com/en/latest/
* For installation instructions on supped platforms, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
* For AMD ROCm binary structure, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#build-amd-rocm
* For AMD ROCm Release History, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#amd-rocm-version-history
# What\'s New in This Release
## Hipfort-Interface for GPU Kernel Libraries
Hipfort is an interface library for accessing GPU Kernels. It provides support to the AMD ROCm architecture from within the Fortran programming language. Currently, the gfortran and HIP-Clang compilers support hipfort. Note, the gfortran compiler belongs to the GNU Compiler Collection (GCC). While hipfc wrapper calls hipcc for the non-fortran kernel source, gfortran is used for FORTRAN applications that call GPU kernels.
The hipfort interface library is meant for Fortran developers with a focus on gfortran users.
For information on HIPFort installation and examples, see
https://github.com/ROCmSoftwarePlatform/hipfort
## ROCm Data Center Tool
The ROCm™ Data Center Tool™ simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments. The important features of this tool are:
* GPU telemetry
* GPU statistics for jobs
* Integration with third-party tools
* Open source
The ROCm Data Center Tool can be used in the standalone mode if all components are installed. The same set of features is also available in a library format that can be used by existing management tools.
![ScreenShot](https://github.com/Rmalavally/ROCm/blob/master/RDCComponentsrevised.png)
Refer to the ROCm Data Center Tool™ User Guide for more details on the different modes of operation.
NOTE: The ROCm Data Center User Guide is intended to provide an overview of ROCm Data Center Tool features and how system administrators and Data Center (or HPC) users can administer and configure AMD GPUs. The guide also provides an overview of its components and open source developer handbook.
For installation information on different distributions, refer to the ROCm Data Center User Guide at
https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_DataCenter_Tool_User_Guide.pdf
### Error Correcting Code Fields in ROCm Data Center Tool
The ROCm Data Center (RDC) tool is enhanced to provide counters to track correctable and uncorrectable errors. While a single bit per word error can be corrected, double bit per word errors cannot be corrected.
The RDC tool now helps monitor and protect undetected memory data corruption. If the system is using ECC- enabled memory, the ROCm Data Center tool can report the error counters to monitor the status of the memory.
![ScreenShot](https://github.com/Rmalavally/ROCm/blob/master/forweb.PNG)
## Static Linking Libraries
The underlying libraries of AMD ROCm are dynamic and are called shared objects (.so) in Linux.
The AMD ROCm v3.8 release includes the capability to build static ROCm libraries and link to the applications statically. CMake target files enable linking an application statically to ROCm libraries and each component exports the required dependencies for linking. The static libraries are called Archives (.a) in Linux.
This release also comprises of the requisite changes required for all the components to work in a static environment. The components have been successfully tested for basic functionalities like *rocminfo /rocm_bandwidth_test* and archives.
In the AMD ROCm v3.8 release, the following libraries support static linking:
![ScreenShot](https://github.com/Rmalavally/ROCm/blob/master/staticlinkinglib.PNG)
# Fixed Defects
The following defects are fixed in this release:
* GPU Kernel C++ Names Not Demangled
* MIGraphX Fails for fp16 Datatype
* Issue with Peer-to-Peer Transfers
* rocprof option --parallel-kernels Not Supported in this Release
# Known Issues
## Undefined Reference Issue in Statically Linked Libraries
Libraries and applications statically linked using flags -rtlib=compiler-rt, such as rocBLAS, have an implicit dependency on gcc_s not captured in their CMAKE configuration.
Client applications may require linking with an additional library -lgcc_s to resolve the undefined reference to symbol '_Unwind_Resume@@GCC_3.0'.
## MIGraphX Pooling Operation Fails for Some Models
MIGraphX does not work for some models with pooling operations and the following error appears:
*test_gpu_ops_test FAILED*
This issue is currently under investigation and there is no known workaround currently.
## MIVisionX Installation Error on CentOS/RHEL8.2 and SLES 15
Installing ROCm on MIVisionX results in the following error on CentOS/RHEL8.2 and SLES 15:
*"Problem: nothing provides opencv needed"*
As a workaround, install opencv before installing MIVisionX.
# Deploying ROCm
AMD hosts both Debian and RPM repositories for the ROCm v3.8.x packages.
For more information on ROCM installation on all platforms, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
# Hardware and Software Support
ROCm is focused on using AMD GPUs to accelerate computational tasks such as machine learning, engineering workloads, and scientific computing.
In order to focus our development efforts on these domains of interest, ROCm supports a targeted set of hardware configurations which are detailed further in this section.
#### Supported GPUs
Because the ROCm Platform has a focus on particular computational domains, we offer official support for a selection of AMD GPUs that are designed to offer good performance and price in these domains.
ROCm officially supports AMD GPUs that use following chips:
* GFX8 GPUs
* "Fiji" chips, such as on the AMD Radeon R9 Fury X and Radeon Instinct MI8
* "Polaris 10" chips, such as on the AMD Radeon RX 580 and Radeon Instinct MI6
* GFX9 GPUs
* "Vega 10" chips, such as on the AMD Radeon RX Vega 64 and Radeon Instinct MI25
* "Vega 7nm" chips, such as on the Radeon Instinct MI50, Radeon Instinct MI60 or AMD Radeon VII
ROCm is a collection of software ranging from drivers and runtimes to libraries and developer tools.
Some of this software may work with more GPUs than the "officially supported" list above, though AMD does not make any official claims of support for these devices on the ROCm software platform.
The following list of GPUs are enabled in the ROCm software, though full support is not guaranteed:
* GFX8 GPUs
* "Polaris 11" chips, such as on the AMD Radeon RX 570 and Radeon Pro WX 4100
* "Polaris 12" chips, such as on the AMD Radeon RX 550 and Radeon RX 540
* GFX7 GPUs
* "Hawaii" chips, such as the AMD Radeon R9 390X and FirePro W9100
As described in the next section, GFX8 GPUs require PCI Express 3.0 (PCIe 3.0) with support for PCIe atomics. This requires both CPU and motherboard support. GFX9 GPUs require PCIe 3.0 with support for PCIe atomics by default, but they can operate in most cases without this capability.
The integrated GPUs in AMD APUs are not officially supported targets for ROCm.
As described [below](#limited-support), "Carrizo", "Bristol Ridge", and "Raven Ridge" APUs are enabled in our upstream drivers and the ROCm OpenCL runtime.
However, they are not enabled in the HIP runtime, and may not work due to motherboard or OEM hardware limitations.
As such, they are not yet officially supported targets for ROCm.
For a more detailed list of hardware support, please see [the following documentation](https://rocm.github.io/hardware.html).
#### Supported CPUs
As described above, GFX8 GPUs require PCIe 3.0 with PCIe atomics in order to run ROCm.
In particular, the CPU and every active PCIe point between the CPU and GPU require support for PCIe 3.0 and PCIe atomics.
The CPU root must indicate PCIe AtomicOp Completion capabilities and any intermediate switch must indicate PCIe AtomicOp Routing capabilities.
Current CPUs which support PCIe Gen3 + PCIe Atomics are:
* AMD Ryzen CPUs
* The CPUs in AMD Ryzen APUs
* AMD Ryzen Threadripper CPUs
* AMD EPYC CPUs
* Intel Xeon E7 v3 or newer CPUs
* Intel Xeon E5 v3 or newer CPUs
* Intel Xeon E3 v3 or newer CPUs
* Intel Core i7 v4, Core i5 v4, Core i3 v4 or newer CPUs (i.e. Haswell family or newer)
* Some Ivy Bridge-E systems
Beginning with ROCm 1.8, GFX9 GPUs (such as Vega 10) no longer require PCIe atomics.
We have similarly opened up more options for number of PCIe lanes.
GFX9 GPUs can now be run on CPUs without PCIe atomics and on older PCIe generations, such as PCIe 2.0.
This is not supported on GPUs below GFX9, e.g. GFX8 cards in the Fiji and Polaris families.
If you are using any PCIe switches in your system, please note that PCIe Atomics are only supported on some switches, such as Broadcom PLX.
When you install your GPUs, make sure you install them in a PCIe 3.1.0 x16, x8, x4, or x1 slot attached either directly to the CPU's Root I/O controller or via a PCIe switch directly attached to the CPU's Root I/O controller.
In our experience, many issues stem from trying to use consumer motherboards which provide physical x16 connectors that are electrically connected as e.g. PCIe 2.0 x4, PCIe slots connected via the Southbridge PCIe I/O controller, or PCIe slots connected through a PCIe switch that does
not support PCIe atomics.
If you attempt to run ROCm on a system without proper PCIe atomic support, you may see an error in the kernel log (`dmesg`):
```
kfd: skipped device 1002:7300, PCI rejects atomics
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
Experimental support for our Hawaii (GFX7) GPUs (Radeon R9 290, R9 390, FirePro W9100, S9150, S9170)
does not require or take advantage of PCIe Atomics. However, we still recommend that you use a CPU
from the list provided above for compatibility purposes.
Alternatively, CMake build is supported.
#### Not supported or limited support under ROCm
```bash
cmake -B build
##### Limited support
cmake --build build --target=doc
```
* ROCm 2.9.x should support PCIe 2.0 enabled CPUs such as the AMD Opteron, Phenom, Phenom II, Athlon, Athlon X2, Athlon II and older Intel Xeon and Intel Core Architecture and Pentium CPUs. However, we have done very limited testing on these configurations, since our test farm has been catering to CPUs listed above. This is where we need community support. _If you find problems on such setups, please report these issues_.
* Thunderbolt 1, 2, and 3 enabled breakout boxes should now be able to work with ROCm. Thunderbolt 1 and 2 are PCIe 2.0 based, and thus are only supported with GPUs that do not require PCIe 3.1.0 atomics (e.g. Vega 10). However, we have done no testing on this configuration and would need community support due to limited access to this type of equipment.
* AMD "Carrizo" and "Bristol Ridge" APUs are enabled to run OpenCL, but do not yet support HIP or our libraries built on top of these compilers and runtimes.
* As of ROCm 2.1, "Carrizo" and "Bristol Ridge" require the use of upstream kernel drivers.
* In addition, various "Carrizo" and "Bristol Ridge" platforms may not work due to OEM and ODM choices when it comes to key configurations parameters such as inclusion of the required CRAT tables and IOMMU configuration parameters in the system BIOS.
* Before purchasing such a system for ROCm, please verify that the BIOS provides an option for enabling IOMMUv2 and that the system BIOS properly exposes the correct CRAT table. Inquire with your vendor about the latter.
* AMD "Raven Ridge" APUs are enabled to run OpenCL, but do not yet support HIP or our libraries built on top of these compilers and runtimes.
* As of ROCm 2.1, "Raven Ridge" requires the use of upstream kernel drivers.
* In addition, various "Raven Ridge" platforms may not work due to OEM and ODM choices when it comes to key configurations parameters such as inclusion of the required CRAT tables and IOMMU configuration parameters in the system BIOS.
* Before purchasing such a system for ROCm, please verify that the BIOS provides an option for enabling IOMMUv2 and that the system BIOS properly exposes the correct CRAT table. Inquire with your vendor about the latter.
## Older ROCm releases
##### Not supported
* "Tonga", "Iceland", "Vega M", and "Vega 12" GPUs are not supported in ROCm 2.9.x
* We do not support GFX8-class GPUs (Fiji, Polaris, etc.) on CPUs that do not have PCIe 3.0 with PCIe atomics.
* As such, we do not support AMD Carrizo and Kaveri APUs as hosts for such GPUs.
* Thunderbolt 1 and 2 enabled GPUs are not supported by GFX8 GPUs on ROCm. Thunderbolt 1 & 2 are based on PCIe 2.0.
#### ROCm support in upstream Linux kernels
As of ROCm 1.9.0, the ROCm user-level software is compatible with the AMD drivers in certain upstream Linux kernels.
As such, users have the option of either using the ROCK kernel driver that are part of AMD's ROCm repositories or using the upstream driver and only installing ROCm user-level utilities from AMD's ROCm repositories.
These releases of the upstream Linux kernel support the following GPUs in ROCm:
* 4.17: Fiji, Polaris 10, Polaris 11
* 4.18: Fiji, Polaris 10, Polaris 11, Vega10
* 4.20: Fiji, Polaris 10, Polaris 11, Vega10, Vega 7nm
The upstream driver may be useful for running ROCm software on systems that are not compatible with the kernel driver available in AMD's repositories.
For users that have the option of using either AMD's or the upstreamed driver, there are various tradeoffs to take into consideration:
| | Using AMD's `rock-dkms` package | Using the upstream kernel driver |
| ---- | ------------------------------------------------------------| ----- |
| Pros | More GPU features, and they are enabled earlier | Includes the latest Linux kernel features |
| | Tested by AMD on supported distributions | May work on other distributions and with custom kernels |
| | Supported GPUs enabled regardless of kernel version | |
| | Includes the latest GPU firmware | |
| Cons | May not work on all Linux distributions or versions | Features and hardware support varies depending on kernel version |
| | Not currently supported on kernels newer than 5.4 | Limits GPU's usage of system memory to 3/8 of system memory (before 5.6). For 5.6 and beyond, both DKMS and upstream kernels allow use of 15/16 of system memory. |
| | | IPC and RDMA capabilities are not yet enabled |
| | | Not tested by AMD to the same level as `rock-dkms` package |
| | | Does not include most up-to-date firmware |
## Machine Learning and High Performance Computing Software Stack for AMD GPU
For an updated version of the software stack for AMD GPU, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#software-stack-for-amd-gpu
For release information for older ROCm releases, refer to the
[CHANGELOG](./CHANGELOG.md).

54
RELEASE.md Normal file
View File

@@ -0,0 +1,54 @@
# Release notes
<!-- Disable lints since this is an auto-generated file. -->
<!-- markdownlint-disable blanks-around-headers -->
<!-- markdownlint-disable no-duplicate-header -->
<!-- markdownlint-disable no-blanks-blockquote -->
<!-- markdownlint-disable ul-indent -->
<!-- markdownlint-disable no-trailing-spaces -->
<!-- spellcheck-disable -->
This page contains the release notes for AMD ROCm Software.
-------------------
## ROCm 6.0.2
The ROCm 6.0.2 point release consists of minor bug fixes to improve the stability of MI300 GPU applications. This release introduces several new driver features for system qualification on our partner server offerings.
### Library changes in ROCm 6.0.2
| Library | Version |
|---------|---------|
| AMDMIGraphX | ⇒ [2.8](https://github.com/ROCm/AMDMIGraphX/releases/tag/rocm-6.0.2) |
| hipBLAS | ⇒ [2.0.0](https://github.com/ROCm/hipBLAS/releases/tag/rocm-6.0.2) |
| hipBLASLt | ⇒ [0.6.0](https://github.com/ROCm/hipBLASLt/releases/tag/rocm-6.0.2) |
| hipCUB | ⇒ [3.0.0](https://github.com/ROCm/hipCUB/releases/tag/rocm-6.0.2) |
| hipFFT | ⇒ [1.0.13](https://github.com/ROCm/hipFFT/releases/tag/rocm-6.0.2) |
| hipRAND | ⇒ [2.10.17](https://github.com/ROCm/hipRAND/releases/tag/rocm-6.0.2) |
| hipSOLVER | ⇒ [2.0.0](https://github.com/ROCm/hipSOLVER/releases/tag/rocm-6.0.2) |
| hipSPARSE | ⇒ [3.0.0](https://github.com/ROCm/hipSPARSE/releases/tag/rocm-6.0.2) |
| hipSPARSELt | ⇒ [0.1.0](https://github.com/ROCm/hipSPARSELt/releases/tag/rocm-6.0.2) |
| hipTensor | ⇒ [1.1.0](https://github.com/ROCm/hipTensor/releases/tag/rocm-6.0.2) |
| MIOpen | ⇒ [2.19.0](https://github.com/ROCm/MIOpen/releases/tag/rocm-6.0.2) |
| rccl | ⇒ [2.15.5](https://github.com/ROCm/rccl/releases/tag/rocm-6.0.2) |
| rocALUTION | ⇒ [3.0.3](https://github.com/ROCm/rocALUTION/releases/tag/rocm-6.0.2) |
| rocBLAS | ⇒ [4.0.0](https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.0.2) |
| rocFFT | ⇒ [1.0.25](https://github.com/ROCm/rocFFT/releases/tag/rocm-6.0.2) |
| rocm-cmake | ⇒ [0.11.0](https://github.com/ROCm/rocm-cmake/releases/tag/rocm-6.0.2) |
| rocPRIM | ⇒ [3.0.0](https://github.com/ROCm/rocPRIM/releases/tag/rocm-6.0.2) |
| rocRAND | ⇒ [3.0.0](https://github.com/ROCm/rocRAND/releases/tag/rocm-6.0.2) |
| rocSOLVER | ⇒ [3.24.0](https://github.com/ROCm/rocSOLVER/releases/tag/rocm-6.0.2) |
| rocSPARSE | ⇒ [3.0.2](https://github.com/ROCm/rocSPARSE/releases/tag/rocm-6.0.2) |
| rocThrust | ⇒ [3.0.0](https://github.com/ROCm/rocThrust/releases/tag/rocm-6.0.2) |
| rocWMMA | ⇒ [1.3.0](https://github.com/ROCm/rocWMMA/releases/tag/rocm-6.0.2) |
| Tensile | ⇒ [4.39.0](https://github.com/ROCm/Tensile/releases/tag/rocm-6.0.2) |
#### hipFFT 1.0.13
hipFFT 1.0.13 for ROCm 6.0.2
##### Changes
* Removed the Git submodule for shared files between rocFFT and hipFFT; instead, just copy the files
over (this should help simplify downstream builds and packaging)

View File

@@ -0,0 +1,47 @@
# MIT License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# ###########################
# ROCm dependencies
# ###########################
include(FetchContent)
if(BUILD_DOCS)
find_package(ROCM 0.11.0 CONFIG QUIET PATHS "${ROCM_PATH}") # First version with Sphinx doc gen improvement
if(NOT ROCM_FOUND)
message(STATUS "ROCm CMake not found. Fetching...")
set(rocm_cmake_tag
"c044bb52ba85058d28afe2313be98d9fed02e293" # develop@2023.09.12. (move to 6.0 tag when released)
CACHE STRING "rocm-cmake tag to download")
FetchContent_Declare(
rocm-cmake
GIT_REPOSITORY https://github.com/RadeonOpenCompute/rocm-cmake.git
GIT_TAG ${rocm_cmake_tag}
SOURCE_SUBDIR "DISABLE ADDING TO BUILD" # We don't really want to consume the build and test targets of ROCm CMake.
)
FetchContent_MakeAvailable(rocm-cmake)
find_package(ROCM CONFIG REQUIRED NO_DEFAULT_PATH PATHS "${rocm-cmake_SOURCE_DIR}")
else()
find_package(ROCM 0.11.0 CONFIG REQUIRED PATHS "${ROCM_PATH}")
endif()
endif()

View File

@@ -1,79 +1,77 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="roc-github"
fetch="http://github.com/RadeonOpenCompute/" />
<remote name="rocm-devtools"
fetch="https://github.com/ROCm-Developer-Tools/" />
<remote name="rocm-swplat"
fetch="https://github.com/ROCmSoftwarePlatform/" />
<remote name="gpuopen-libs"
fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
<remote name="gpuopen-tools"
fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup"
fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-3.8.0"
remote="roc-github"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCM-->
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<remote name="roc-github" fetch="https://github.com/RadeonOpenCompute/" />
<remote name="rocm-devtools" fetch="https://github.com/ROCm-Developer-Tools/" />
<remote name="rocm-swplat" fetch="https://github.com/ROCmSoftwarePlatform/" />
<remote name="gpuopen-libs" fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
<remote name="gpuopen-tools" fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup" fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-6.0.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCm-->
<project name="ROCK-Kernel-Driver" />
<project name="ROCT-Thunk-Interface" />
<project name="ROCR-Runtime" />
<project name="ROC-smi" />
<project name="rocm_smi_lib" remote="roc-github" />
<project name="amdsmi" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocm-cmake" />
<project name="rocminfo" />
<project name="rocprofiler" remote="rocm-devtools" />
<project name="roctracer" remote="rocm-devtools" />
<project name="ROCm-OpenCL-Runtime" />
<project name="rocm_bandwidth_test" />
<project name="rocprofiler" />
<project name="roctracer" />
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" revision="6c03f8b58fafd9dd693eaac826749a5cfad515f8" />
<project name="clang-ocl" />
<!--HIP Projects-->
<project name="HIP" remote="rocm-devtools" />
<project name="HIP-Examples" remote="rocm-devtools" />
<project name="ROCclr" remote="rocm-devtools" />
<project name="HIPIFY" remote="rocm-devtools" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="llvm-project" path="llvm_amd-stg-open" />
<project name="rdc" />
<!--HIP Projects-->
<project name="HIP" />
<project name="HIP-Examples" />
<project name="clr" />
<project name="hipother" />
<project name="HIPIFY" />
<project name="HIPCC" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="llvm-project" />
<project name="ROCm-Device-Libs" />
<project name="atmi" />
<project name="ROCm-CompilerSupport" />
<project name="rocr_debug_agent" remote="rocm-devtools" />
<project name="rocm_bandwidth_test" />
<project name="RCP" remote="gpuopen-tools" revision="3a49405a1500067c49d181844ec90aea606055bb" />
<!-- gdb projects -->
<project name="ROCgdb" remote="rocm-devtools" />
<project name="ROCdbgapi" remote="rocm-devtools" />
<!-- ROCm Libraries -->
<project name="rocBLAS" remote="rocm-swplat" />
<project name="hipBLAS" remote="rocm-swplat" />
<project name="rocFFT" remote="rocm-swplat" />
<project name="rocRAND" remote="rocm-swplat" />
<project name="rocSPARSE" remote="rocm-swplat" />
<project name="rocSOLVER" remote="rocm-swplat" />
<project name="hipSPARSE" remote="rocm-swplat" />
<project name="rocALUTION" remote="rocm-swplat" />
<project name="MIOpenGEMM" remote="rocm-swplat" />
<project name="MIOpen" remote="rocm-swplat" />
<project name="rccl" remote="rocm-swplat" />
<project name="MIVisionX" remote="gpuopen-libs" />
<project name="rocThrust" remote="rocm-swplat" />
<project name="hipCUB" remote="rocm-swplat" />
<project name="rocPRIM" remote="rocm-swplat" />
<project name="hipfort" remote="rocm-swplat" />
<project name="ROCmValidationSuite" remote="rocm-devtools" />
<!-- Projects for AOMP -->
<project name="ROCT-Thunk-Interface" path="aomp/roct-thunk-interface" remote="roc-github" />
<project name="ROCR-Runtime" path="aomp/rocr-runtime" remote="roc-github" />
<project name="ROCm-Device-Libs" path="aomp/rocm-device-libs" remote="roc-github" />
<project name="ROCm-CompilerSupport" path="aomp/rocm-compilersupport" remote="roc-github" />
<project name="rocminfo" path="aomp/rocminfo" remote="roc-github" />
<project name="HIP" path="aomp/hip-on-vdi" remote="rocm-devtools" />
<project name="aomp" path="aomp/aomp" remote="rocm-devtools" />
<project name="aomp-extras" path="aomp/aomp-extras" remote="rocm-devtools" />
<project name="flang" path="aomp/flang" remote="rocm-devtools" />
<project name="amd-llvm-project" path="aomp/amd-llvm-project" remote="rocm-devtools" />
<project name="ROCclr" path="aomp/vdi" remote="rocm-devtools" />
<project name="ROCm-OpenCL-Runtime" path="aomp/opencl-on-vdi" remote="roc-github" />
<project name="half" revision="37742ce15b76b44e4b271c1e66d13d2fa7bd003e" />
<!-- gdb projects -->
<project name="ROCgdb" />
<project name="ROCdbgapi" />
<project name="rocr_debug_agent" />
<!-- ROCm Libraries -->
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="Tensile" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipBLAS" />
<project groups="mathlibs" name="hipBLASLt" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="hipFFT" />
<project groups="mathlibs" name="rocRAND" />
<project groups="mathlibs" name="hipRAND" />
<project groups="mathlibs" name="rocSPARSE" />
<project groups="mathlibs" name="hipSPARSELt" />
<project groups="mathlibs" name="rocSOLVER" />
<project groups="mathlibs" name="hipSOLVER" />
<project groups="mathlibs" name="hipSPARSE" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocThrust" />
<project groups="mathlibs" name="hipCUB" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="rccl" />
<project name="MIOpen" />
<project name="composable_kernel" />
<project name="MIVisionX" />
<project name="rpp" />
<project name="hipfort" />
<project name="AMDMIGraphX" />
<project name="ROCmValidationSuite" />
<!-- Projects for OpenMP-Extras -->
<project name="aomp" path="openmp-extras/aomp" />
<project name="aomp-extras" path="openmp-extras/aomp-extras" />
<project name="flang" path="openmp-extras/flang" />
</manifest>

33
docs/CMakeLists.txt Normal file
View File

@@ -0,0 +1,33 @@
# MIT License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
include(ROCMSphinxDoc)
rocm_add_sphinx_doc(
"${CMAKE_CURRENT_SOURCE_DIR}"
OUTPUT_DIR html
BUILDER html
)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/html"
DESTINATION "${CMAKE_INSTALL_DOCDIR}")

View File

@@ -0,0 +1,483 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="OpenMP support in ROCm">
<meta name="keywords" content="OpenMP, LLVM, OpenMP toolchain">
</head>
# OpenMP support in ROCm
## Introduction
The ROCm™ installation includes an LLVM-based implementation that fully supports
the OpenMP 4.5 standard and a subset of OpenMP 5.0, 5.1, and 5.2 standards.
Fortran, C/C++ compilers, and corresponding runtime libraries are included.
Along with host APIs, the OpenMP compilers support offloading code and data onto
GPU devices. This document briefly describes the installation location of the
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
with OpenMP applications. The GPUs supported are the same as those supported by
this ROCm release. See the list of supported GPUs for {doc}`Linux<rocm-install-on-linux:reference/system-requirements>` and
{doc}`Windows<rocm-install-on-windows:reference/system-requirements>`.
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
The following image illustrates the internal steps taken to translate a users application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
![OpenMP toolchain](../../data/reference/openmp/openmp-toolchain.svg "OpenMP toolchain")
### Installation
The OpenMP toolchain is automatically installed as part of the standard ROCm
installation and is available under `/opt/rocm-{version}/llvm`. The
sub-directories are:
* bin: Compilers (`flang` and `clang`) and other binaries.
* examples: The usage section below shows how to compile and run these programs.
* include: Header files.
* lib: Libraries including those required for target offload.
* lib-debug: Debug versions of the above libraries.
## OpenMP: usage
The example programs can be compiled and run by pointing the environment
variable `ROCM_PATH` to the ROCm install directory.
**Example:**
```bash
export ROCM_PATH=/opt/rocm-{version}
cd $ROCM_PATH/share/openmp-extras/examples/openmp/veccopy
sudo make run
```
:::{note}
`sudo` is required since we are building inside the `/opt` directory.
Alternatively, copy the files to your home directory first.
:::
The above invocation of Make compiles and runs the program. Note the options
that are required for target offload from an OpenMP program:
```bash
-fopenmp --offload-arch=<gpu-arch>
```
:::{note}
The compiler also accepts the alternative offloading notation:
```bash
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
```
:::
Obtain the value of `gpu-arch` by running the following command:
```bash
% /opt/rocm-{version}/bin/rocminfo | grep gfx
```
[//]: # (dated link below, needs updating)
See the complete list of compiler command-line references
[here](https://github.com/RadeonOpenCompute/llvm-project/blob/amd-stg-open/clang/docs/CommandGuide/clang.rst).
### Using `rocprof` with OpenMP
The following steps describe a typical workflow for using `rocprof` with OpenMP
code compiled with AOMP:
1. Run `rocprof` with the program command line:
```bash
% rocprof <application> <args>
```
This produces a `results.csv` file in the users current directory that
shows basic stats such as kernel names, grid size, number of registers used,
etc. The user can choose to specify the preferred output file name using the
o option.
2. Add options for a detailed result:
```bash
--stats: % rocprof --stats <application> <args>
```
The stats option produces timestamps for the kernels. Look into the output
CSV file for the field, `DurationNs`, which is useful in getting an
understanding of the critical kernels in the code.
Apart from `--stats`, the option `--timestamp` on produces a timestamp for
the kernels.
3. After learning about the required kernels, the user can take a detailed look
at each one of them. `rocprof` has support for hardware counters: a set of
basic and a set of derived ones. See the complete list of counters using
options --list-basic and --list-derived. `rocprof` accepts either a text or
an XML file as an input.
For more details on `rocprof`, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Using tracing options
**Prerequisite:** When using the `--sys-trace` option, compile the OpenMP
program with:
```bash
-Wl,-rpath,/opt/rocm-{version}/lib -lamdhip64
```
The following tracing options are widely used to generate useful information:
* **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
API execution traces and a flat profile in a CSV file.
* **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
Since this option results in loading ``libamdhip64.so``, follow the
prerequisite as mentioned above.
A CSV and a JSON file are produced by the above trace options. The CSV file
presents the data in a tabular format, and the JSON file can be visualized using
Google Chrome at chrome://tracing/ or [Perfetto](https://perfetto.dev/).
Navigate to Chrome or Perfetto and load the JSON file to see the timeline of the
HSA calls.
For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Environment variables
:::{table}
:widths: auto
| Environment Variable | Purpose |
| --------------------------- | ---------------------------- |
| `OMP_NUM_TEAMS` | To set the number of teams for kernel launch, which is otherwise chosen by the implementation by default. You can set this number (subject to implementation limits) for performance tuning. |
| `LIBOMPTARGET_KERNEL_TRACE` | To print useful statistics for device operations. Setting it to 1 and running the program emits the name of every kernel launched, the number of teams and threads used, and the corresponding register usage. Setting it to 2 additionally emits timing information for kernel launches and data transfer operations between the host and the device. |
| `LIBOMPTARGET_INFO` | To print informational messages from the device runtime as the program executes. Setting it to a value of 1 or higher, prints fine-grain information and setting it to -1 prints complete information. |
| `LIBOMPTARGET_DEBUG` | To get detailed debugging information about data transfer operations and kernel launch when using a debug version of the device library. Set this environment variable to 1 to get the detailed information from the library. |
| `GPU_MAX_HW_QUEUES` | To set the number of HSA queues in the OpenMP runtime. The HSA queues are created on demand up to the maximum value as supplied here. The queue creation starts with a single initialized queue to avoid unnecessary allocation of resources. The provided value is capped if it exceeds the recommended, device-specific value. |
| `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` | To set the threshold size up to which data transfers are initiated asynchronously. The default threshold size is 1*1024*1024 bytes (1MB). |
| `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
:::
## OpenMP: features
The OpenMP programming model is greatly enhanced with the following new features
implemented in the past releases.
(openmp_usm)=
### Asynchronous behavior in OpenMP target regions
* Controlling Asynchronous Behavior
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
* Multithreaded Offloading on the Same Device
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
* Parallel Memory Copy Invocations
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
### Unified shared memory
Unified Shared Memory (USM) provides a pointer-based approach to memory
management. To implement USM, fulfill the following system requirements along
with Xnack capability.
#### Prerequisites
* Linux Kernel versions above 5.14
* Latest KFD driver packaged in ROCm stack
* Xnack, as USM support can only be tested with applications compiled with Xnack
capability
#### Xnack capability
When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
enabled both at compile- and run-time. To enable Xnack support at compile-time,
use:
```bash
--offload-arch=gfx908:xnack+
```
Or use another functionally equivalent option Xnack-any:
```bash
--offload-arch=gfx908
```
To enable Xnack functionality at runtime on a per-application basis,
use environment variable:
```bash
HSA_XNACK=1
```
When Xnack support is not needed:
* Build the applications to maximize resource utilization using:
```bash
--offload-arch=gfx908:xnack-
```
* At runtime, set the `HSA_XNACK` environment variable to 0.
#### Unified shared memory pragma
This OpenMP pragma is available on MI200 through `xnack+` support.
```bash
omp requires unified_shared_memory
```
As stated in the OpenMP specifications, this pragma makes the map clause on
target constructs optional. By default, on MI200, all memory allocated on the
host is fine grain. Using the map clause on a target clause is allowed, which
transforms the access semantics of the associated memory to coarse grain.
```bash
A simple program demonstrating the use of this feature is:
$ cat parallel_for.cpp
#include <stdlib.h>
#include <stdio.h>
#define N 64
#pragma omp requires unified_shared_memory
int main() {
int n = N;
int *a = new int[n];
int *b = new int[n];
for(int i = 0; i < n; i++)
b[i] = i;
#pragma omp target parallel for map(to:b[:n])
for(int i = 0; i < n; i++)
a[i] = b[i];
for(int i = 0; i < n; i++)
if(a[i] != i)
printf("error at %d: expected %d, got %d\n", i, i+1, a[i]);
return 0;
}
$ clang++ -O2 -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:xnack+ parallel_for.cpp
$ HSA_XNACK=1 ./a.out
```
In the above code example, pointer “a” is not mapped in the target region, while
pointer “b” is. Both are valid pointers on the GPU device and passed by-value to
the kernel implementing the target region. This means the pointer values on the
host and the device are the same.
The difference between the memory pages pointed to by these two variables is
that the pages pointed by “a” are in fine-grain memory, while the pages pointed
to by “b” are in coarse-grain memory during and after the execution of the
target region. This is accomplished in the OpenMP runtime library with calls to
the ROCr runtime to set the pages pointed by “b” as coarse grain.
### OMPT target support
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as
described in the OpenMP specification document. These APIs allow first-party
tools to examine the profile and kernel traces that execute on a device. A tool
can register callbacks for data transfer and kernel dispatch entry points or use
APIs to start and stop tracing for device-related activities such as data
transfer and kernel dispatch timings and associated metadata. If device tracing
is enabled, trace records for device activities are collected during program
execution and returned to the tool using the APIs described in the
specification.
The following example demonstrates how a tool uses the supported OMPT target
APIs. The `README` in `/opt/rocm/llvm/examples/tools/ompt` outlines the steps to
be followed, and the provided example can be run as shown below:
```bash
cd $ROCM_PATH/share/openmp-extras/examples/tools/ompt/veccopy-ompt-target-tracing
sudo make run
```
The file `veccopy-ompt-target-tracing.c` simulates how a tool initiates device
activity tracing. The file `callbacks.h` shows the callbacks registered and
implemented by the tool.
### Floating point atomic operations
The MI200-series GPUs support the generation of hardware floating-point atomics
using the OpenMP atomic pragma. The support includes single- and
double-precision floating-point atomic operations. The programmer must ensure
that the memory subjected to the atomic operation is in coarse-grain memory by
mapping it explicitly with the help of map clauses when not implicitly mapped by
the compiler as per the [OpenMP
specifications](https://www.openmp.org/specifications/). This makes these
hardware floating-point atomic instructions “fast,” as they are faster than
using a default compare-and-swap loop scheme, but at the same time “unsafe,” as
they are not supported on fine-grain memory. The operation in
`unified_shared_memory` mode also requires programmers to map the memory
explicitly when not implicitly mapped by the compiler.
To request fast floating-point atomic instructions at the file level, use
compiler flag `-munsafe-fp-atomics` or a hint clause on a specific pragma:
```bash
double a = 0.0;
#pragma omp atomic hint(AMD_fast_fp_atomics)
a = a + 1.0;
```
:::{note}
`AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
`AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
:::
To disable the generation of fast floating-point atomic instructions at the file
level, build using the option `-msafe-fp-atomics` or use a hint clause on a
specific pragma:
```bash
double a = 0.0;
#pragma omp atomic hint(AMD_safe_fp_atomics)
a = a + 1.0;
```
The hint clause value always has a precedence over the compiler flag, which
allows programmers to create atomic constructs with a different behavior than
the rest of the file.
See the example below, where the user builds the program using
`-msafe-fp-atomics` to select a file-wide “safe atomic” compilation. However,
the fast atomics hint clause over variable “a” takes precedence and operates on
“a” using a fast/unsafe floating-point atomic, while the variable “b” in the
absence of a hint clause is operated upon using safe floating-point atomics as
per the compiler flag.
```bash
double a = 0.0;.
#pragma omp atomic hint(AMD_fast_fp_atomics)
a = a + 1.0;
double b = 0.0;
#pragma omp atomic
b = b + 1.0;
```
### AddressSanitizer tool
AddressSanitizer (ASan) is a memory error detector tool utilized by applications to
detect various errors ranging from spatial issues such as out-of-bound access to
temporal issues such as use-after-free. The AOMP compiler supports ASan for AMD
GPUs with applications written in both HIP and OpenMP.
**Features supported on host platform (Target x86_64):**
* Use-after-free
* Buffer overflows
* Heap buffer overflow
* Stack buffer overflow
* Global buffer overflow
* Use-after-return
* Use-after-scope
* Initialization order bugs
**Features supported on AMDGPU platform (`amdgcn-amd-amdhsa`):**
* Heap buffer overflow
* Global buffer overflow
**Software (kernel/OS) requirements:** Unified Shared Memory support with Xnack
capability. See the section on [Unified Shared Memory](#unified-shared-memory)
for prerequisites and details on Xnack.
**Example:**
* Heap buffer overflow
```bash
void main() {
....... // Some program statements
....... // Some program statements
#pragma omp target map(to : A[0:N], B[0:N]) map(from: C[0:N])
{
#pragma omp parallel for
for(int i =0 ; i < N; i++){
C[i+10] = A[i] + B[i];
} // end of for loop
}
....... // Some program statements
}// end of main
```
See the complete sample code for heap buffer overflow
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
* Global buffer overflow
```bash
#pragma omp declare target
int A[N],B[N],C[N];
#pragma omp end declare target
void main(){
...... // some program statements
...... // some program statements
#pragma omp target data map(to:A[0:N],B[0:N]) map(from: C[0:N])
{
#pragma omp target update to(A,B)
#pragma omp target parallel for
for(int i=0; i<N; i++){
C[i]=A[i*100]+B[i+22];
} // end of for loop
#pragma omp target update from(C)
}
........ // some program statements
} // end of main
```
See the complete sample code for global buffer overflow
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
### Clang compiler option for kernel optimization
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
* `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including no-loop and Cross-team reductions.
* `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
* `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
* `-O3` if no `-O*` is specified by the user.
### Specialized kernels
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
* No-loop
* Big-jump-loop
* Cross-team reductions
To enable the generation of specialized kernels, follow these guidelines:
* Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the no-loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the big-jump-loop kernel instead of the no-loop kernel.
* Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
* To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
#### No-loop kernel generation
The no-loop kernel generation feature optimizes the compiler performance by generating a specialized kernel for certain OpenMP target constructs such as target teams distribute parallel for. The specialized kernel generation feature assumes every thread executes a single iteration of the user loop, which leads the runtime to launch a total number of GPU threads equal to or greater than the iteration space size of the target region loop. This allows the compiler to generate code for the loop body without an enclosing loop, resulting in reduced control-flow complexity and potentially better performance.
#### Big-jump-loop kernel generation
A no-loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the big-jump-loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
#### Cross-team optimized reduction kernel generation
If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient cross-team communication. New APIs for cross-team reduction are implemented in the device runtime and are automatically generated by clang.

13
docs/about/license.md Normal file
View File

@@ -0,0 +1,13 @@
# License
:::{note}
This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that
primarily contains documentation. For other licensing information, refer to the
[Licensing Terms page](./licensing).
:::
```{include} ../../LICENSE
```
```{include} ./licensing.md
```

133
docs/about/licensing.md Normal file
View File

@@ -0,0 +1,133 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm licensing terms">
<meta name="keywords" content="license, licensing terms">
</head>
# ROCm licensing terms
ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately.
The following table is a list of ROCm components with links to their respective license
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license, and link to the license terms.
The table is ordered to follow the ROCm manifest file.
<!-- spellcheck-disable -->
| Component | License |
|:---------------------|:-------------------------|
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCmSoftwarePlatform/hipTensor) | [MIT](https://github.com/ROCmSoftwarePlatform/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.
Proprietary products are only available on https://repo.radeon.com. Currently, only
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
Proprietary components are organized in a proprietary subdirectory in the package
repositories to distinguish from open sourced packages.
The additional terms and conditions below apply to your use of ROCm technical
documentation.
©2023 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.
## Package licensing
:::{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary and is
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
AOCC CPU Optimizations.
:::
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -0,0 +1,157 @@
.. meta::
:description: How ROCm uses PCIe atomics
:keywords: PCIe, PCIe atomics, atomics, BAR memory, AMD, ROCm
*****************************************************************************
How ROCm uses PCIe atomics
*****************************************************************************
ROCm PCIe feature and overview of BAR memory
================================================================
ROCm is an extension of HSA platform architecture, so it shares the queuing model, memory model,
signaling and synchronization protocols. Platform atomics are integral to perform queuing and
signaling memory operations where there may be multiple-writers across CPU and GPU agents.
The full list of HSA system architecture platform requirements are here:
`HSA Sys Arch Features <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf>`_.
AMD ROCm Software uses the new PCI Express 3.0 (Peripheral Component Interconnect Express [PCIe]
3.0) features for atomic read-modify-write transactions which extends inter-processor synchronization
mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling
memory operations.
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``,
``SWAP`` atomics. The atomic operations are initiated by the I/O device which support 32-bit, 64-bit and
128-bit operand which target address have to be naturally aligned to operation sizes.
For ROCm the Platform atomics are used in ROCm in the following ways:
* Update HSA queue's read_dispatch_id: 64 bit atomic add used by the command processor on the
GPU agent to update the packet ID it processed.
* Update HSA queue's write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to
support multi-writer queue insertions.
* Update HSA Signals -- 64bit atomic ops are used for CPU & GPU synchronization.
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through
and completed by PCIe components. Routing and completion does not require software support.
Component support for each is detectable via the Device Capabilities 2 (DevCap2) register. Upstream
bridges need to have atomic operations routing enabled or the atomic operations will fail even though
PCIe endpoint and PCIe I/O devices has the capability to atomic operations.
To do atomic operations routing capability between two or more Root Ports, each associated Root Port
must indicate that capability via the atomic operations routing supported bit in the DevCap2 register.
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic
operations requests are permitted only if a component's ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE``
field is set. These requests can only be serviced if the upstream components support atomic operation
completion and/or routing to a component which does. Atomic operations routing support=1, routing
is supported; atomic operations routing support=0, routing is not supported.
An atomic operation is a non-posted transaction supporting 32-bit and 64-bit address formats, there
must be a response for Completion containing the result of the operation. Errors associated with the
operation (uncorrectable error accessing the target location or carrying out the atomic operation) are
signaled to the requester by setting the Completion Status field in the completion descriptor, they are
set to to Completer Abort (CA) or Unsupported Request (UR).
To understand more about how PCIe atomic operations work, see
`PCIe atomics <https://pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf>`_
`Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/project/linux-pci/patch/1443110390-4080-1-git-send-email-jay@jcornwall.me/>`_
There are also a number of papers which talk about these new capabilities:
* `Atomic Read Modify Write Primitives by Intel <https://www.intel.es/content/dam/doc/white-paper/atomic-read-modify-write-primitives-i-o-devices-paper.pdf>`_
* `PCI express 3 Accelerator White paper by Intel <https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf>`_
* `Intel PCIe Generation 3 Hotchips Paper <https://www.hotchips.org/wp-content/uploads/hc_archives/hc21/1_sun/HC21.23.1.SystemInterconnectTutorial-Epub/HC21.23.131.Ajanovic-Intel-PCIeGen3.pdf>`_
* `PCIe Generation 4 Base Specification includes atomic operations <https://astralvx.com/storage/2020/11/PCI_Express_Base_4.0_Rev0.3_February19-2014.pdf>`_
Other I/O devices with PCIe atomics support
* `Mellanox ConnectX-5 InfiniBand Card <http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card.pdf>`_
* `Cray Aries Interconnect <http://www.hoti.org/hoti20/slides/Bob_Alverson.pdf>`_
* `Xilinx PCIe Ultrascale White paper <https://docs.xilinx.com/v/u/8OZSA2V1b1LLU2rRCDVGQw>`_
* `Xilinx 7 Series Devices <https://docs.xilinx.com/v/u/1nfXeFNnGpA0ywyykvWHWQ>`_
Future bus technology with richer I/O atomics operation Support
* GenZ
New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPUs
with PCIe Generation 3.0 support.
* `Mellanox Bluefield SOC <https://docs.nvidia.com/networking/display/BlueFieldSWv25111213/BlueField+Software+Overview>`_
* `Cavium Thunder X2 <https://en.wikichip.org/wiki/cavium/thunderx2>`_
In ROCm, we also take advantage of PCIe ID based ordering technology for P2P when the GPU
originates two writes to two different targets:
* Write to another GPU memory
* Write to system memory to indicate transfer complete
They are routed off to different ends of the computer but we want to make sure the write to system
memory to indicate transfer complete occurs AFTER P2P write to GPU has complete.
BAR memory overview
----------------------------------------------------------------------------------------------------
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set
memory-mapped input/output (MMIO) base address (MMIOH base) and range (MMIO high size) in the BIOS.
In the Supermicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP configuration-\> Above 4G Decoding = Enabled
* Advanced->PCIe/PCI/PnP Configuration-\>MMIOH Base = 512G
* Advanced->PCIe/PCI/PnP Configuration-\>MMIO High Size = 256G
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
* BAR0-1 registers: 64bit, prefetchable, GPU memory. 8GB or 16GB depending on Vega10 SKU. Must
be placed < 2^44 to support P2P access from other Vega10.
* BAR2-3 registers: 64bit, prefetchable, Doorbell. Must be placed \< 2^44 to support P2P access from
other Vega10.
* BAR4 register: Optional, not a boot device.
* BAR5 register: 32bit, non-prefetchable, MMIO. Must be placed \< 4GB.
Here is how our base address register (BAR) works on GFX 8 GPUs with 40 bit Physical Address Limit ::
11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO
Series] (rev c1)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b35
Flags: bus master, fast devsel, latency 0, IRQ 119
Memory at bf40000000 (64-bit, prefetchable) [size=256M]
Memory at bf50000000 (64-bit, prefetchable) [size=2M]
I/O ports at 3000 [size=256]
Memory at c7400000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at c7440000 [disabled] [size=128K]
Legend:
1 : GPU Frame Buffer BAR -- In this example it happens to be 256M, but typically this will be size of the
GPU memory (typically 4GB+). This BAR has to be placed \< 2^40 to allow peer-to-peer access from
other GFX8 AMD GPUs. For GFX9 (Vega GPU) the BAR has to be placed \< 2^44 to allow peer-to-peer
access from other GFX9 AMD GPUs.
2 : Doorbell BAR -- The size of the BAR is typically will be \< 10MB (currently fixed at 2MB) for this
generation GPUs. This BAR has to be placed \< 2^40 to allow peer-to-peer access from other current
generation AMD GPUs.
3 : IO BAR -- This is for legacy VGA and boot device support, but since this the GPUs in this project are
not VGA devices (headless), this is not a concern even if the SBIOS does not setup.
4 : MMIO BAR -- This is required for the AMD Driver SW to access the configuration registers. Since the
reminder of the BAR available is only 1 DWORD (32bit), this is placed \< 4GB. This is fixed at 256KB.
5 : Expansion ROM -- This is required for the AMD Driver SW to access the GPU video-bios. This is
currently fixed at 128KB.
For more information, you can review
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.

View File

@@ -0,0 +1,333 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Inference optimization with MIGraphX">
<meta name="keywords" content="Inference optimization, MIGraphX, deep-learning, MIGraphX
installation, AMD, ROCm">
</head>
# Inference optimization with MIGraphX
The following sections cover inferencing and introduces [MIGraphX](https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/).
## Inference
The inference is where capabilities learned during deep-learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep-learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
Taking our previous example of MNIST, the DNN can be fed new images of handwritten digit images, allowing the neural network to classify digits. A fully trained DNN should make accurate predictions about what an image represents, and inference cannot happen without training.
## MIGraphX introduction
MIGraphX is a graph compiler focused on accelerating the machine-learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the machine-learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
* Operator fusion
* Arithmetic simplifications
* Dead-code elimination
* Common subexpression elimination (CSE)
* Constant propagation
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
* Number of arguments
* Type of arguments
* Shape of arguments
After optimization passes, all these operators get mapped to different kernels on GPUs or CPUs.
After importing a model into MIGraphX, the model is represented as `migraphx::program`. `migraphx::program` is made up of `migraphx::module`. The program can consist of several modules, but it always has one main_module. Modules are made up of `migraphx::instruction_ref`. Instructions contain the `migraphx::op` and arguments to the operator.
## Installing MIGraphX
There are three options to get started with MIGraphX installation. MIGraphX depends on ROCm libraries; assume that the machine has ROCm installed.
### Option 1: installing binaries
To install MIGraphX on Debian-based systems like Ubuntu, use the following command:
```bash
sudo apt update && sudo apt install -y migraphx
```
The header files and libraries are installed under `/opt/rocm-\<version\>`, where \<version\> is the ROCm version.
### Option 2: building from source
There are two ways to build the MIGraphX sources.
* [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses `[rbuild](https://github.com/RadeonOpenCompute/rbuild)` to install the prerequisites and build the libraries with just one command.
or
* [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
[https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source)
### Option 3: use docker
To use Docker, follow these steps:
1. The easiest way to set up the development environment is to use Docker. To build Docker from scratch, first clone the MIGraphX repository by running:
```bash
git clone --recursive https://github.com/ROCmSoftwarePlatform/AMDMIGraphX
```
2. The repository contains a Dockerfile from which you can build a Docker image as:
```bash
docker build -t migraphx .
```
3. Then to enter the development environment, use Docker run:
```bash
docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
```
The Docker image contains all the prerequisites required for the installation, so users can go to the folder `/code/AMDMIGraphX` and follow the steps mentioned in [Option 2: Building from Source](#option-2-building-from-source).
## MIGraphX example
MIGraphX provides both C++ and Python APIs. The following sections show examples of both using the Inception v3 model. To walk through the examples, fetch the Inception v3 ONNX model by running the following:
```py
import torch
import torchvision.models as models
inception = models.inception_v3(pretrained=True)
torch.onnx.export(inception,torch.randn(1,3,299,299), "inceptioni1.onnx")
```
This will create `inceptioni1.onnx`, which can be imported in MIGraphX using C++ or Python API.
### MIGraphX Python API
Follow these steps:
1. To import the MIGraphX module in Python script, set `PYTHONPATH` to the MIGraphX libraries installation. If binaries are installed using steps mentioned in [Option 1: Installing Binaries](#option-1-installing-binaries), perform the following action:
```bash
export PYTHONPATH=$PYTHONPATH:/opt/rocm/
```
2. The following script shows the usage of Python API to import the ONNX model, compile it, and run inference on it. Set `LD_LIBRARY_PATH` to `/opt/rocm/` if required.
```py
# import migraphx and numpy
import migraphx
import numpy as np
# import and parse inception model
model = migraphx.parse_onnx("inceptioni1.onnx")
# compile model for the GPU target
model.compile(migraphx.get_target("gpu"))
# optionally print compiled model
model.print()
# create random input image
input_image = np.random.rand(1, 3, 299, 299).astype('float32')
# feed image to model, 'x.1` is the input param name
results = model.run({'x.1': input_image})
# get the results back
result_np = np.array(results[0])
# print the inferred class of the input image
print(np.argmax(result_np))
```
Find additional examples of Python API in the `/examples` directory of the MIGraphX repository.
## MIGraphX C++ API
Follow these steps:
1. The following is a minimalist example that shows the usage of MIGraphX C++ API to load ONNX file, compile it for the GPU, and run inference on it. To use MIGraphX C++ API, you only need to load the `migraphx.hpp` file. This example runs inference on the Inception v3 model.
```c++
#include <vector>
#include <string>
#include <algorithm>
#include <ctime>
#include <random>
#include <migraphx/migraphx.hpp>
int main(int argc, char** argv)
{
migraphx::program prog;
migraphx::onnx_options onnx_opts;
// import and parse onnx file into migraphx::program
prog = parse_onnx("inceptioni1.onnx", onnx_opts);
// print imported model
prog.print();
migraphx::target targ = migraphx::target("gpu");
migraphx::compile_options comp_opts;
comp_opts.set_offload_copy();
// compile for the GPU
prog.compile(targ, comp_opts);
// print the compiled program
prog.print();
// randomly generate input image
// of shape (1, 3, 299, 299)
std::srand(unsigned(std::time(nullptr)));
std::vector<float> input_image(1*299*299*3);
std::generate(input_image.begin(), input_image.end(), std::rand);
// users need to provide data for the input
// parameters in order to run inference
// you can query into migraph program for the parameters
migraphx::program_parameters prog_params;
auto param_shapes = prog.get_parameter_shapes();
auto input = param_shapes.names().front();
// create argument for the parameter
prog_params.add(input, migraphx::argument(param_shapes[input], input_image.data()));
// run inference
auto outputs = prog.eval(prog_params);
// read back the output
float* results = reinterpret_cast<float*>(outputs[0].data());
float* max = std::max_element(results, results + 1000);
int answer = max - results;
std::cout << "answer: " << answer << std::endl;
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
project (CAI)
set (CMAKE_CXX_STANDARD 14)
set (EXAMPLE inception_inference)
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
find_package (migraphx)
message("source file: " ${EXAMPLE}.cpp " ---> bin: " ${EXAMPLE})
add_executable(${EXAMPLE} ${EXAMPLE}.cpp)
target_link_libraries(${EXAMPLE} migraphx::c)
```
3. To build the executable file, run the following from the directory containing the `inception_inference.cpp` file:
```bash
mkdir build
cd build
cmake ..
make -j$(nproc)
./inception_inference
```
:::{note}
Set `LD_LIBRARY_PATH` to `/opt/rocm/lib` if required during the build. Additional examples can be found in the MIGraphX repository under the `/examples/` directory.
:::
## Tuning MIGraphX
MIGraphX uses MIOpen kernels to target AMD GPU. For the model compiled with MIGraphX, tune MIOpen to pick the best possible kernel implementation. The MIOpen tuning results in a significant performance boost. Tuning can be done by setting the environment variable `MIOPEN_FIND_ENFORCE=3`.
:::{note}
The tuning process can take a long time to finish.
:::
**Example:** The average inference time of the inception model example shown previously over 100 iterations using untuned kernels is 0.01383ms. After tuning, it reduces to 0.00459ms, which is a 3x improvement. This result is from ROCm v4.5 on a MI100 GPU.
:::{note}
The results may vary depending on the system configurations.
:::
For reference, the following code snippet shows inference runs for only the first 10 iterations for both tuned and untuned kernels:
```console
### UNTUNED ###
iterator : 0
Inference complete
Inference time: 0.063ms
iterator : 1
Inference complete
Inference time: 0.008ms
iterator : 2
Inference complete
Inference time: 0.007ms
iterator : 3
Inference complete
Inference time: 0.007ms
iterator : 4
Inference complete
Inference time: 0.007ms
iterator : 5
Inference complete
Inference time: 0.008ms
iterator : 6
Inference complete
Inference time: 0.007ms
iterator : 7
Inference complete
Inference time: 0.028ms
iterator : 8
Inference complete
Inference time: 0.029ms
iterator : 9
Inference complete
Inference time: 0.029ms
### TUNED ###
iterator : 0
Inference complete
Inference time: 0.063ms
iterator : 1
Inference complete
Inference time: 0.004ms
iterator : 2
Inference complete
Inference time: 0.004ms
iterator : 3
Inference complete
Inference time: 0.004ms
iterator : 4
Inference complete
Inference time: 0.004ms
iterator : 5
Inference complete
Inference time: 0.004ms
iterator : 6
Inference complete
Inference time: 0.004ms
iterator : 7
Inference complete
Inference time: 0.004ms
iterator : 8
Inference complete
Inference time: 0.004ms
iterator : 9
Inference complete
Inference time: 0.004ms
```
### YModel
The best inference performance through MIGraphX is conditioned upon having tuned kernel configurations stored in a `/home` local User Database (DB). If a user were to move their model to a different server or allow a different user to use it, they would have to run through the MIOpen tuning process again to populate the next User DB with the best kernel configurations and corresponding solvers.
Tuning is time consuming, and if the users have not performed tuning, they would see discrepancies between expected or claimed inference performance and actual inference performance. This has led to repetitive and time-consuming tuning tasks for each user.
MIGraphX introduces a feature, known as YModel, that stores the kernel config parameters found during tuning into a `.mxr` file. This ensures the same level of expected performance, even when a model is copied to a different user/system.
The YModel feature is available starting from ROCm 5.4.1 and UIF 1.1.
#### YModel example
Through the `migraphx-driver` functionality, you can generate `.mxr` files with tuning information stored inside it by passing additional `--binary --output model.mxr` to `migraphx-driver` along with the rest of the necessary flags.
For example, to generate `.mxr` file from the ONNX model, use the following:
```bash
./path/to/migraphx-driver compile --onnx resnet50.onnx --enable-offload-copy --binary --output resnet50.mxr
```
To run generated `.mxr` files through `migraphx-driver`, use the following:
```bash
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
```
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` files.
![Generating an MXR file](../data/conceptual/image018.png "Generating an MXR file")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,408 @@
.. meta::
:description: Using CMake
:keywords: CMake, dependencies, HIP, C++, AMD, ROCm
*********************************
Using CMake
*********************************
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of the CMake HIP language support will require CMake 3.21 or higher.
Finding dependencies
====================
.. note::
For a complete
reference on how to deal with dependencies in CMake, refer to the CMake docs
on `find_package
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake related facilities.
In short, CMake supports finding dependencies in two ways:
* In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to find the component
in typical install locations and layouts. CMake ships a few dozen such scripts, but users and projects
may ship them as well.
* In Config mode, it locates a file named ``<packagename>-config.cmake`` or
``<PackageName>Config.cmake`` which describes the installed component in all regards needed to
consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of file system
paths), or using ``<PackageName>_ROOT`` variable on a project-specific basis.
There are nearly a dozen ways to set these variables. One may be more convenient
over the other depending on your workflow. Conceptually the simplest is adding
it to your CMake configuration command on the command line via
``-D CMAKE_PREFIX_PATH=....`` . AMD packaged ROCm installs can typically be
added to the config file search paths such as:
* Windows: ``-D CMAKE_PREFIX_PATH=${env:HIP_PATH}``
* Linux: ``-D CMAKE_PREFIX_PATH=/opt/rocm``
ROCm provides the respective *config-file* packages, and this enables
``find_package`` to be used directly. ROCm does not require any Find module as
the *config-file* packages are shipped with the upstream projects, such as
rocPRIM and other ROCm libraries.
For a complete guide on where and how ROCm may be installed on a system, refer
to the installation guides for
`Linux <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html>`_
and
`Windows <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/index.html>`_.
Using HIP in CMake
==================
ROCm components providing a C/C++ interface support consumption via any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
library.
Using the HIP single-source programming model
---------------------------------------------
Source code written in the HIP dialect of C++ typically uses the `.hip`
extension. When the HIP CMake language is enabled, it will automatically
associate such source files with the HIP toolchain being used.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.21) # HIP language support requires 3.21
cmake_policy(VERSION 3.21.3...3.27)
project(MyProj LANGUAGES HIP)
add_executable(MyApp Main.hip)
Should you have existing CUDA code that is from the source compatible subset of
HIP, you can tell CMake that despite their `.cu` extension, they're HIP sources.
Do note that this mostly facilitates compiling kernel code-only source files,
as host-side CUDA API won't compile in this fashion.
.. code-block:: cmake
add_library(MyLib MyLib.cu)
set_source_files_properties(MyLib.cu PROPERTIES LANGUAGE HIP)
CMake itself only hosts part of the HIP language support, such as defining
HIP-specific properties, etc. while the other half ships with the HIP
implementation, such as ROCm. CMake will search for a file
`hip-lang-config.cmake` describing how the the properties defined by CMake
translate to toolchain invocations. If one installs ROCm using non-standard
methods or layouts and CMake can't locate this file or detect parts of the SDK,
there's a catch-all, last resort variable consulted locating this file,
``-D CMAKE_HIP_COMPILER_ROCM_ROOT:PATH=`` which should be set the root of the
ROCm installation.
.. note::
Imported targets defined by `hip-lang-config.cmake` are for internal use
only.
If the user doesn't provide a semi-colon delimited list of device architectures
via ``CMAKE_HIP_ARCHITECTURES``, CMake will select some sensible default. It is
advised though that if a user knows what devices they wish to target, then set
this variable explicitly.
Consuming ROCm C/C++ libraries
------------------------------
Libraries such as rocBLAS, rocFFT, MIOpen, etc. behave as C/C++ libraries.
Illustrated in the example below is a C++ application using MIOpen from CMake.
It calls ``find_package(miopen)``, which provides the ``MIOpen`` imported
target. This can be linked with ``target_link_libraries``
.. code-block:: cmake
cmake_minimum_required(VERSION 3.5) # find_package(miopen) requires 3.5
cmake_policy(VERSION 3.5...3.27)
project(MyProj LANGUAGES CXX)
find_package(miopen)
add_library(MyLib ...)
target_link_libraries(MyLib PUBLIC MIOpen)
.. note::
Most libraries are designed as host-only API, so using a GPU device
compiler is not necessary for downstream projects unless they use GPU device
code.
Consuming the HIP API in C++ code
---------------------------------
Consuming the HIP API without compiling single-source GPU device code can be
done using any C++ compiler. The ``find_package(hip)`` provides the
``hip::host`` imported target to use HIP in this scenario.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.5) # find_package(hip) requires 3.5
cmake_policy(VERSION 3.5...3.27)
project(MyProj LANGUAGES CXX)
find_package(hip REQUIRED)
add_executable(MyApp ...)
target_link_libraries(MyApp PRIVATE hip::host)
When mixing such ``CXX`` sources with ``HIP`` sources holding device-code, link
only to `hip::host`. If HIP sources don't have `.hip` as their extension, use
`set_source_files_properties(<hip_sources>... PROPERTIES LANGUAGE HIP)` on them.
Linking to `hip::host` will set all the necessary flags for the ``CXX`` sources
while ``HIP`` sources inherit all flags from the built-in language support.
Having HIP sources in a target will turn the |LINK_LANG|_ into ``HIP``.
.. |LINK_LANG| replace:: ``LINKER_LANGUAGE``
.. _LINK_LANG: https://cmake.org/cmake/help/latest/prop_tgt/LINKER_LANGUAGE.html
Compiling device code in C++ language mode
------------------------------------------
.. attention::
The workflow detailed here is considered legacy and is shown for
understanding's sake. It pre-dates the existence of HIP language support in
CMake. If source code has HIP device code in it, it is a HIP source file
and should be compiled as such. Only resort to the method below if your
HIP-enabled CMake code path can't mandate CMake version 3.21.
If code uses the HIP API and compiles GPU device code, it requires using a
device compiler. The compiler for CMake can be set using either the
``CMAKE_C_COMPILER`` and ``CMAKE_CXX_COMPILER`` variable or using the ``CC``
and ``CXX`` environment variables. This can be set when configuring CMake or
put into a CMake toolchain file. The device compiler must be set to a
compiler that supports AMD GPU targets, which is usually Clang.
The ``find_package(hip)`` provides the ``hip::device`` imported target to add
all the flags necessary for device compilation.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.8) # cxx_std_11 requires 3.8
cmake_policy(VERSION 3.8...3.27)
project(MyProj LANGUAGES CXX)
find_package(hip REQUIRED)
add_library(MyLib ...)
target_link_libraries(MyLib PRIVATE hip::device)
target_compile_features(MyLib PRIVATE cxx_std_11)
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with the following CMake commands.
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
- Linux: ``cmake -D CMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++``
Which use the device compiler provided from the binary packages of
`ROCm HIP SDK <https://www.amd.com/en/developer/rocm-hub.html>`_ and
`repo.radeon.com <https://repo.radeon.com>`_ respectively.
When using the ``CXX`` language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake packages
-------------------
+-----------+----------+--------------------------------------------------------+
| Component | Package | Targets |
+===========+==========+========================================================+
| HIP | hip | ``hip::host``, ``hip::device`` |
+-----------+----------+--------------------------------------------------------+
| rocPRIM | rocprim | ``roc::rocprim`` |
+-----------+----------+--------------------------------------------------------+
| rocThrust | rocthrust| ``roc::rocthrust`` |
+-----------+----------+--------------------------------------------------------+
| hipCUB | hipcub | ``hip::hipcub`` |
+-----------+----------+--------------------------------------------------------+
| rocRAND | rocrand | ``roc::rocrand`` |
+-----------+----------+--------------------------------------------------------+
| rocBLAS | rocblas | ``roc::rocblas`` |
+-----------+----------+--------------------------------------------------------+
| rocSOLVER | rocsolver| ``roc::rocsolver`` |
+-----------+----------+--------------------------------------------------------+
| hipBLAS | hipblas | ``roc::hipblas`` |
+-----------+----------+--------------------------------------------------------+
| rocFFT | rocfft | ``roc::rocfft`` |
+-----------+----------+--------------------------------------------------------+
| hipFFT | hipfft | ``hip::hipfft`` |
+-----------+----------+--------------------------------------------------------+
| rocSPARSE | rocsparse| ``roc::rocsparse`` |
+-----------+----------+--------------------------------------------------------+
| hipSPARSE | hipsparse| ``roc::hipsparse`` |
+-----------+----------+--------------------------------------------------------+
| rocALUTION|rocalution| ``roc::rocalution`` |
+-----------+----------+--------------------------------------------------------+
| RCCL | rccl | ``rccl`` |
+-----------+----------+--------------------------------------------------------+
| MIOpen | miopen | ``MIOpen`` |
+-----------+----------+--------------------------------------------------------+
| MIGraphX | migraphx | ``migraphx::migraphx``, ``migraphx::migraphx_c``, |
| | | ``migraphx::migraphx_cpu``, ``migraphx::migraphx_gpu``,|
| | | ``migraphx::migraphx_onnx``, ``migraphx::migraphx_tf`` |
+-----------+----------+--------------------------------------------------------+
Using CMake presets
===================
CMake command lines depending on how specific users like to be when compiling
code can grow to unwieldy lengths. This is the primary reason why projects tend
to bake script snippets into their build definitions controlling compiler
warning levels, changing CMake defaults (``CMAKE_BUILD_TYPE`` or
``BUILD_SHARED_LIBS`` just to name a few) and all sorts anti-patterns, all in
the name of convenience.
Load on the command-line interface (CLI) starts immediately by selecting a
toolchain, the set of utilities used to compile programs. To ease some of the
toolchain related pains, CMake does consult the ``CC`` and ``CXX`` environmental
variables when setting a default ``CMAKE_C[XX]_COMPILER`` respectively, but that
is just the tip of the iceberg. There's a fair number of variables related to
just the toolchain itself (typically supplied using
`toolchain files <https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html>`_
), and then we still haven't talked about user preference or project-specific
options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Integration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_.
There are two types of preset files: one supplied by the project, called
``CMakePresets.json`` which is meant to be committed to version control,
typically used to drive CI; and one meant for the user to provide, called
``CMakeUserPresets.json``, typically used to house user preference and adapting
the build to the user's environment. These JSON files are allowed to include
other JSON files and the user presets always implicitly includes the non-user
variant.
Using HIP with presets
----------------------
Following is an example ``CMakeUserPresets.json`` file which actually compiles
the `amd/rocm-examples <https://github.com/amd/rocm-examples>`_ suite of sample
applications on a typical ROCm installation:
.. code-block:: json
{
"version": 3,
"cmakeMinimumRequired": {
"major": 3,
"minor": 21,
"patch": 0
},
"configurePresets": [
{
"name": "layout",
"hidden": true,
"binaryDir": "${sourceDir}/build/${presetName}",
"installDir": "${sourceDir}/install/${presetName}"
},
{
"name": "generator-ninja-multi-config",
"hidden": true,
"generator": "Ninja Multi-Config"
},
{
"name": "toolchain-makefiles-c/c++-amdclang",
"hidden": true,
"cacheVariables": {
"CMAKE_C_COMPILER": "/opt/rocm/bin/amdclang",
"CMAKE_CXX_COMPILER": "/opt/rocm/bin/amdclang++",
"CMAKE_HIP_COMPILER": "/opt/rocm/bin/amdclang++"
}
},
{
"name": "clang-strict-iso-high-warn",
"hidden": true,
"cacheVariables": {
"CMAKE_C_FLAGS": "-Wall -Wextra -pedantic",
"CMAKE_CXX_FLAGS": "-Wall -Wextra -pedantic",
"CMAKE_HIP_FLAGS": "-Wall -Wextra -pedantic"
}
},
{
"name": "ninja-mc-rocm",
"displayName": "Ninja Multi-Config ROCm",
"inherits": [
"layout",
"generator-ninja-multi-config",
"toolchain-makefiles-c/c++-amdclang",
"clang-strict-iso-high-warn"
]
}
],
"buildPresets": [
{
"name": "ninja-mc-rocm-debug",
"displayName": "Debug",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm"
},
{
"name": "ninja-mc-rocm-release",
"displayName": "Release",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm"
},
{
"name": "ninja-mc-rocm-debug-verbose",
"displayName": "Debug (verbose)",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm",
"verbose": true
},
{
"name": "ninja-mc-rocm-release-verbose",
"displayName": "Release (verbose)",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm",
"verbose": true
}
],
"testPresets": [
{
"name": "ninja-mc-rocm-debug",
"displayName": "Debug",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm",
"execution": {
"jobs": 0
}
},
{
"name": "ninja-mc-rocm-release",
"displayName": "Release",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm",
"execution": {
"jobs": 0
}
}
]
}
.. note::
Getting presets to work reliably on Windows requires some CMake improvements
and/or support from compiler vendors. (Refer to
`Add support to the Visual Studio generators <https://gitlab.kitware.com/cmake/cmake/-/issues/24245>`_
and `Sourcing environment scripts <https://gitlab.kitware.com/cmake/cmake/-/issues/21619>`_
.)

View File

@@ -0,0 +1,21 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm compilers disambiguation">
<meta name="keywords" content="compilers, compiler naming, AMD, ROCm">
</head>
# ROCm compilers disambiguation
ROCm ships multiple compilers of varying origins and purposes. This article
disambiguates compiler naming used throughout the documentation.
## Compiler terms
| Term | Description |
| - | - |
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -0,0 +1,172 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm Linux Filesystem Hierarchy Standard reorganization">
<meta name="keywords" content="FHS, Linux Filesystem Hierarchy Standard, directory structure,
AMD, ROCm">
</head>
# ROCm Linux Filesystem Hierarchy Standard reorganization
## Introduction
The ROCm Software has adopted the Linux Filesystem Hierarchy Standard (FHS) [https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html) in order to to ensure ROCm is consistent with standard open source conventions. The following sections specify how current and future releases of ROCm adhere to FHS, how the previous ROCm file system is supported, and how improved versioning specifications are applied to ROCm.
## Adopting the FHS
In order to standardize ROCm directory structure and directory content layout ROCm has adopted the [FHS](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html), adhering to open source conventions for Linux-based distribution. FHS ensures internal consistency within the ROCm stack, as well as external consistency with other systems and distributions. The ROCm proposed file structure is outlined below:
```none
/opt/rocm-<ver>
| -- bin
| -- all public binaries
| -- lib
| -- lib<soname>.so->lib<soname>.so.major->lib<soname>.so.major.minor.patch
(public libaries to link with applications)
| -- <component>
| -- architecture dependent libraries and binaries used internally by components
| -- cmake
| -- <component>
| --<component>-config.cmake
| -- libexec
| -- <component>
| -- non ISA/architecture independent executables used internally by components
| -- include
| -- <component>
| -- public header files
| -- share
| -- html
| -- <component>
| -- html documentation
| -- info
| -- <component>
| -- info files
| -- man
| -- <component>
| -- man pages
| -- doc
| -- <component>
| -- license files
| -- <component>
| -- samples
| -- architecture independent misc files
```
## Changes from earlier ROCm versions
The following table provides a brief overview of the new ROCm FHS layout, compared to the layout of earlier ROCm versions. Note that /opt/ is used to denote the default rocm-installation-path and should be replaced in case of a non-standard installation location of the ROCm distribution.
```none
______________________________________________________
| New ROCm Layout | Previous ROCm Layout |
|_____________________________|________________________|
| /opt/rocm-<ver> | /opt/rocm-<ver> |
| | -- bin | | -- bin |
| | -- lib | | -- lib |
| | -- cmake | | -- include |
| | -- libexec | | -- <component_1> |
| | -- include | | -- bin |
| | -- <component_1> | | -- cmake |
| | -- share | | -- doc |
| | -- html | | -- lib |
| | -- info | | -- include |
| | -- man | | -- samples |
| | -- doc | | -- <component_n> |
| | -- <component_1> | | -- bin |
| | -- samples | | -- cmake |
| | -- .. | | -- doc |
| | -- <component_n> | | -- lib |
| | -- samples | | -- include |
| | -- .. | | -- samples |
|______________________________________________________|
```
## ROCm FHS reorganization: backward compatibility
The FHS file organization for ROCm was first introduced in the release of ROCm 5.2 . Backward compatibility was implemented to make sure users could still run their ROCm applications while transitioning to the new FHS. ROCm has moved header files and libraries to their new locations as indicated in the above structure, and included symbolic-links and wrapper header files in their old location for backward compatibility. The following sections detail ROCm backward compatibility implementation for wrapper header files, executable files, library files and CMake config files.
### Wrapper header files
Wrapper header files are placed in the old location (
`/opt/rocm-<ver>/<component>/include`) with a warning message to include files
from the new location (`/opt/rocm-<ver>/include`) as shown in the example below.
```cpp
#pragma message "This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip."
#include <hip/hip_runtime.h>
```
* Starting at ROCm 5.2 release, the deprecation for backward compatibility wrapper header files is: `#pragma` message announcing `#warning`.
* Starting from ROCm 6.0 (tentatively) backward compatibility for wrapper header files will be removed, and the `#pragma` message will be announcing `#error`.
### Executable files
Executable files are available in the `/opt/rocm-<ver>/bin` folder. For backward
compatibility, the old library location (`/opt/rocm-<ver>/<component>/bin`) has a
soft link to the library at the new location. Soft links will be removed in a
future release, tentatively ROCm v6.0.
```bash
$ ls -l /opt/rocm/hip/bin/
lrwxrwxrwx 1 root root 24 Jan 1 23:32 hipcc -> ../../bin/hipcc
```
### Library files
Library files are available in the `/opt/rocm-<ver>/lib` folder. For backward
compatibility, the old library location (`/opt/rocm-<ver>/<component>/lib`) has a
soft link to the library at the new location. Soft links will be removed in a
future release, tentatively ROCm v6.0.
```shell
$ ls -l /opt/rocm/hip/lib/
drwxr-xr-x 4 root root 4096 Jan 1 10:45 cmake
lrwxrwxrwx 1 root root 24 Jan 1 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
```
### CMake config files
All CMake configuration files are available in the
`/opt/rocm-<ver>/lib/cmake/<component>` folder. For backward compatibility, the
old CMake locations (`/opt/rocm-<ver>/<component>/lib/cmake`) consist of a soft
link to the new CMake config. Soft links will be removed in a future release,
tentatively ROCm v6.0.
```shell
$ ls -l /opt/rocm/hip/lib/cmake/hip/
lrwxrwxrwx 1 root root 42 Jan 1 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
```
## Changes required in applications using ROCm
Applications using ROCm are advised to use the new file paths. As the old files
will be deprecated in a future release. Applications have to make sure to include
correct header file and use correct search paths.
1. `#include<header_file.h>` needs to be changed to
`#include <component/header_file.h>`
For example: `#include <hip.h>` needs to change
to `#include <hip/hip.h>`
2. Any variable in CMake or Makefiles pointing to component folder needs to
changed.
For example: `VAR1=/opt/rocm/hip` needs to be changed to `VAR1=/opt/rocm`
`VAR2=/opt/rocm/hsa` needs to be changed to `VAR2=/opt/rocm`
3. Any reference to `/opt/rocm/<component>/bin` or `/opt/rocm/<component>/lib`
needs to be changed to `/opt/rocm/bin` and `/opt/rocm/lib/`, respectively.
## Changes in versioning specifications
In order to better manage ROCm dependencies specification and allow smoother releases of ROCm while avoiding dependency conflicts, ROCm software shall adhere to the following scheme when numbering and incrementing ROCm files versions:
rocm-\<ver\>, where \<ver\> = \<x.y.z\>
x.y.z denote: MAJOR.MINOR.PATCH
z: PATCH - increment z when implementing backward compatible bug fixes.
y: MINOR - increment y when implementing minor changes that add functionality but are still backward compatible.
x: MAJOR - increment x when implementing major changes that are not backward compatible.

View File

@@ -0,0 +1,58 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="GPU architecture">
<meta name="keywords" content="GPU architecture, architecture support, MI200, MI250, RDNA,
MI100, AMD Instinct">
</head>
# GPU architecture documentation
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card}
**AMD Instinct MI200 series**
Review hardware aspects of the AMD Instinct™ MI200 series of GPU
accelerators and the CDNA™ 2 architecture.
* [AMD Instinct™ MI250 microarchitecture](./gpu-arch/mi250.md)
* [AMD Instinct MI200/CDNA2 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
* [Performance counters](./gpu-arch/mi200-performance-counters.md)
:::
:::{grid-item-card}
**AMD Instinct MI100**
Review hardware aspects of the AMD Instinct™ MI100
accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs.
* [AMD Instinct™ MI100 microarchitecture](./gpu-arch/mi100.md)
* [AMD Instinct MI100/CDNA1 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
* [White paper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
:::
:::{grid-item-card}
**RDNA**
* [AMD RDNA3 ISA](https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf)
* [AMD RDNA2 ISA](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
* [AMD RDNA ISA](https://www.amd.com/system/files/TechDocs/rdna-shader-instruction-set-architecture.pdf)
* [AMD RDNA Architecture White Paper](https://www.amd.com/system/files/documents/rdna-whitepaper.pdf)
:::
:::{grid-item-card}
**Older architectures**
* [AMD Instinct MI50/Vega 7nm ISA](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
* [AMD Instinct MI25/Vega ISA](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
* [AMD GCN3 ISA](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
* [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
:::
:::::

View File

@@ -0,0 +1,94 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD Instinct MI100 microarchitecture">
<meta name="keywords" content="Instinct, MI100, microarchitecture, AMD, ROCm">
</head>
# AMD Instinct™ MI100 microarchitecture
The following image shows the node-level architecture of a system that
comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators.
The two EPYC processors are connected to each other with the AMD Infinity™
fabric which provides a high-bandwidth (up to 18 GT/sec) and coherent links such
that each processor can access the available node memory as a single
shared-memory domain in a non-uniform memory architecture (NUMA) fashion. In a
2P, or dual-socket, configuration, three AMD Infinity™ fabric links are
available to connect the processors plus one PCIe Gen 4 x16 link per processor
can attach additional I/O devices such as the host adapters for the network
fabric.
![Structure of a single GCD in the AMD Instinct MI100 accelerator](../../data/conceptual/gpu-arch/image004.png "Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.")
In a typical node configuration, each processor can host up to four AMD
Instinct™ accelerators that are attached using PCIe Gen 4 links at 16 GT/sec,
which corresponds to a peak bidirectional link bandwidth of 32 GB/sec. Each hive
of four accelerators can participate in a fully connected, coherent AMD
Instinct™ fabric that connects the four accelerators using 23 GT/sec AMD
Infinity fabric links that run at a higher frequency than the inter-processor
links. This inter-GPU link can be established in certified server systems if the
GPUs are mounted in neighboring PCIe slots by installing the AMD Infinity
Fabric™ bridge for the AMD Instinct™ accelerators.
## Microarchitecture
The microarchitecture of the AMD Instinct accelerators is based on the AMD CDNA
architecture, which targets compute applications such as high-performance
computing (HPC) and AI & machine learning (ML) that run on everything from
individual servers to the world's largest exascale supercomputers. The overall
system architecture is designed for extreme scalability and compute performance.
![Structure of the AMD Instinct accelerator (MI100 generation)](../../data/conceptual/gpu-arch/image005.png "Structure of the AMD Instinct accelerator (MI100 generation)")
The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16
link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host
processor(s). It also shows the three AMD Infinity Fabric ports that provide
high-speed links (23 GT/sec, also at the bottom) to the other GPUs of the local
hive.
On the left and right of the floor plan, the High Bandwidth Memory (HBM)
attaches via the GPU memory controller. The MI100 generation of the AMD
Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total
of 32GB with a 4,096bit-wide memory interface. The peak memory bandwidth of the
attached HBM2 is 1.228 TB/sec at a memory clock frequency of 1.2 GHz.
The execution units of the GPU are depicted in the above image as Compute
Units (CU). There are a total 120 compute units that are physically organized
into eight Shader Engines (SE) with fifteen compute units per shader engine.
Each compute unit is further sub-divided into four SIMD units that process SIMD
instructions of 16 data elements per instruction. This enables the CU to process
64 data elements (a so-called 'wavefront') at a peak clock frequency of 1.5 GHz.
Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS
(`4 [SIMD units] x 16 [elements per instruction] x 120 [CU] x 1.5 [GHz]`).
![Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture](../../data/conceptual/gpu-arch/image006.png "An MI100 compute unit with detailed SIMD view of the AMD CDNA architecture")
The preceding image shows the block diagram of a single CU of an AMD Instinct™
MI100 accelerator and summarizes how instructions flow through the execution
engines. The CU fetches the instructions via a 32KB instruction cache and moves
them forward to execution via a dispatcher. The CU can handle up to ten
wavefronts at a time and feed their instructions into the execution unit. The
execution unit contains 256 vector general-purpose registers (VGPR) and 800
scalar general-purpose registers (SGPR). The VGPR and SGPR are dynamically
allocated to the executing wavefronts. A wavefront can access a maximum of 102
scalar registers. Excess scalar-register usage will cause register spilling and
thus may affect execution performance.
A wavefront can occupy any number of VGPRs from 0 to 256, directly affecting
occupancy; that is, the number of concurrently active wavefronts in the CU. For
instance, with 119 VGPRs used, only two wavefronts can be active in the CU at
the same time. With the instruction latency of four cycles per SIMD instruction,
the occupancy should be as high as possible such that the compute unit can
improve execution efficiency by scheduling instructions from multiple
wavefronts.
:::{table} Peak-performance capabilities of MI100 for different data types.
:name: mi100-perf
| Computation and Data Type | FLOPS/CLOCK/CU | Peak TFLOPS |
| :------------------------ | :------------: | ----------: |
| Vector FP64 | 64 | 11.5 |
| Matrix FP32 | 256 | 46.1 |
| Vector FP32 | 128 | 23.1 |
| Matrix FP16 | 1024 | 184.6 |
| Matrix BF16 | 512 | 92.3 |
:::

View File

@@ -0,0 +1,578 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="MI200 performance counters and metrics">
<meta name="keywords" content="MI200, performance counters, counters, GRBM counters, GRBM,
CPF counters, CPF, CPC counters, CPC, command processor counters, SPI counters, SPI, AMD, ROCm">
</head>
# MI200 performance counters and metrics
<!-- markdownlint-disable no-duplicate-header -->
This document lists and describes the hardware performance counters and derived metrics available on the AMD Instinct™ MI200 GPU. All the hardware basic counters and derived metrics are accessible via {doc}`ROCProfiler tool <rocprofiler:rocprofv1>`.
## MI200 performance counters list
See the category-wise listing of MI200 performance counters in the following tables.
:::{note}
Preliminary validation of all MI200 performance counters is in progress. Those with “*” appended to the names require further evaluation.
:::
### Graphics Register Bus Management (GRBM) counters
| Hardware Counter | Unit | Definition |
|:--------------------|:--------|:--------------------------------------------------------------------------|
| `GRBM_COUNT` | Cycles | Number of free-running GPU cycles |
| `GRBM_GUI_ACTIVE` | Cycles | Number of GPU active cycles |
| `GRBM_CP_BUSY` | Cycles | Number of cycles any of the Command Processor (CP) blocks are busy |
| `GRBM_SPI_BUSY` | Cycles | Number of cycles any of the Shader Processor Input (SPI) are busy in the shader engine(s) |
| `GRBM_TA_BUSY` | Cycles | Number of cycles any of the Texture Addressing Unit (TA) are busy in the shader engine(s) |
| `GRBM_TC_BUSY` | Cycles | Number of cycles any of the Texture Cache Blocks (TCP/TCI/TCA/TCC) are busy |
| `GRBM_CPC_BUSY` | Cycles | Number of cycles the Command Processor - Compute (CPC) is busy |
| `GRBM_CPF_BUSY` | Cycles | Number of cycles the Command Processor - Fetcher (CPF) is busy |
| `GRBM_UTCL2_BUSY` | Cycles | Number of cycles the Unified Translation Cache - Level 2 (UTCL2) block is busy |
| `GRBM_EA_BUSY` | Cycles | Number of cycles the Efficiency Arbiter (EA) block is busy |
### Command Processor (CP) counters
The CP counters are further classified into CP-Fetcher (CPF) and CP-Compute (CPC).
#### CPF counters
| Hardware Counter | Unit | Definition |
|:--------------------------------------|:--------|:-------------------------------------------------------------|
| `CPF_CMP_UTCL1_STALL_ON_TRANSLATION` | Cycles | Number of cycles one of the Compute UTCL1s is stalled waiting on translation |
| `CPF_CPF_STAT_BUSY` | Cycles | Number of cycles CPF is busy |
| `CPF_CPF_STAT_IDLE*` | Cycles | Number of cycles CPF is idle |
| `CPF_CPF_STAT_STALL` | Cycles | Number of cycles CPF is stalled |
| `CPF_CPF_TCIU_BUSY` | Cycles | Number of cycles CPF Texture Cache Interface Unit (TCIU) interface is busy |
| `CPF_CPF_TCIU_IDLE` | Cycles | Number of cycles CPF TCIU interface is idle |
| `CPF_CPF_TCIU_STALL*` | Cycles | Number of cycles CPF TCIU interface is stalled waiting on free tags |
#### CPC counters
| Hardware Counter | Unit | Definition |
|:---------------------------------|:-------|:---------------------------------------------------|
| `CPC_ME1_BUSY_FOR_PACKET_DECODE` | Cycles | Number of cycles CPC Micro Engine (ME1) is busy decoding packets |
| `CPC_UTCL1_STALL_ON_TRANSLATION` | Cycles | Number of cycles one of the UTCL1s is stalled waiting on translation |
| `CPC_CPC_STAT_BUSY` | Cycles | Number of cycles CPC is busy |
| `CPC_CPC_STAT_IDLE` | Cycles | Number of cycles CPC is idle |
| `CPC_CPC_STAT_STALL` | Cycles | Number of cycles CPC is stalled |
| `CPC_CPC_TCIU_BUSY` | Cycles | Number of cycles CPC TCIU interface is busy |
| `CPC_CPC_TCIU_IDLE` | Cycles | Number of cycles CPC TCIU interface is idle |
| `CPC_CPC_UTCL2IU_BUSY` | Cycles | Number of cycles CPC UTCL2 interface is busy |
| `CPC_CPC_UTCL2IU_IDLE` | Cycles | Number of cycles CPC UTCL2 interface is idle |
| `CPC_CPC_UTCL2IU_STALL` | Cycles | Number of cycles CPC UTCL2 interface is stalled |
| `CPC_ME1_DC0_SPI_BUSY` | Cycles | Number of cycles CPC ME1 Processor is busy |
### Shader Processor Input (SPI) counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:-----------|:-----------------------------------------------------------|
| `SPI_CSN_BUSY` | Cycles | Number of cycles with outstanding waves |
| `SPI_CSN_WINDOW_VALID` | Cycles | Number of cycles enabled by `perfcounter_start` event |
| `SPI_CSN_NUM_THREADGROUPS` | Workgroups | Number of dispatched workgroups |
| `SPI_CSN_WAVE` | Wavefronts | Number of dispatched wavefronts |
| `SPI_RA_REQ_NO_ALLOC` | Cycles | Number of Arb cycles with requests but no allocation |
|`SPI_RA_REQ_NO_ALLOC_CSN` | Cycles | Number of Arb cycles with Compute Shader, n-th pipe (CSn) requests but no CSn allocation |
| `SPI_RA_RES_STALL_CSN` | Cycles | Number of Arb stall cycles due to shortage of CSn pipeline slots |
| `SPI_RA_TMP_STALL_CSN*` | Cycles | Number of stall cycles due to shortage of temp space |
| `SPI_RA_WAVE_SIMD_FULL_CSN` | SIMD-cycles | Accumulated number of Single Instruction Multiple Data (SIMDs) per cycle affected by shortage of wave slots for CSn wave dispatch |
| `SPI_RA_VGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of VGPR slots for CSn wave dispatch |
| `SPI_RA_SGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of SGPR slots for CSn wave dispatch |
| `SPI_RA_LDS_CU_FULL_CSN` | CUs | Number of Compute Units (CUs) affected by shortage of LDS space for CSn wave dispatch |
| `SPI_RA_BAR_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting at a BARRIER |
| `SPI_RA_BULKY_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting for BULKY resource |
| `SPI_RA_TGLIM_CU_FULL_CSN*` | Cycles | Number of CSn wave stall cycles due to restriction of `tg_limit` for thread group size |
| `SPI_RA_WVLIM_STALL_CSN*` | Cycles | Number of cycles CSn is stalled due to WAVE_LIMIT |
| `SPI_VWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (VGPRs) when launching waves |
| `SPI_SWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (SGPRs) when launching waves |
### Compute Unit (CU) counters
The CU counters are further classified into instruction mix, Matrix Fused Multiply Add (MFMA) operation counters, level counters, wavefront counters, wavefront cycle counters and Local Data Share (LDS) counters.
#### Instruction mix
| Hardware Counter | Unit | Definition |
|:-----------------------|:-----|:-----------------------------------------------------------------------|
| `SQ_INSTS` | Instr | Number of instructions issued. |
| `SQ_INSTS_VALU` | Instr | Number of Vector Arithmetic Logic Unit (VALU) instructions including MFMA issued. |
| `SQ_INSTS_VALU_ADD_F16` | Instr | Number of VALU Half Precision Floating Point (F16) ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F16` | Instr | Number of VALU F16 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F16` | Instr | Number of VALU F16 Fused Multiply Add (FMA)/ Multiply Add (MAD) instructions issued. |
| `SQ_INSTS_VALU_TRANS_F16` | Instr | Number of VALU F16 Transcendental instructions issued. |
| `SQ_INSTS_VALU_ADD_F32` | Instr | Number of VALU Full Precision Floating Point (F32) ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F32` | Instr | Number of VALU F32 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F32` | Instr | Number of VALU F32 FMA/MAD instructions issued. |
| `SQ_INSTS_VALU_TRANS_F32` | Instr | Number of VALU F32 Transcendental instructions issued. |
| `SQ_INSTS_VALU_ADD_F64` | Instr | Number of VALU F64 ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F64` | Instr | Number of VALU F64 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F64` | Instr | Number of VALU F64 FMA/MAD instructions issued. |
| `SQ_INSTS_VALU_TRANS_F64` | Instr | Number of VALU F64 Transcendental instructions issued. |
| `SQ_INSTS_VALU_INT32` | Instr | Number of VALU 32-bit integer instructions (signed or unsigned) issued. |
| `SQ_INSTS_VALU_INT64` | Instr | Number of VALU 64-bit integer instructions (signed or unsigned) issued. |
| `SQ_INSTS_VALU_CVT` | Instr | Number of VALU Conversion instructions issued. |
| `SQ_INSTS_VALU_MFMA_I8` | Instr | Number of 8-bit Integer MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F16` | Instr | Number of F16 MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_BF16` | Instr | Number of Brain Floating Point - 16 (BF16) MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F32` | Instr | Number of F32 MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F64` | Instr | Number of F64 MFMA instructions issued. |
| `SQ_INSTS_MFMA` | Instr | Number of MFMA instructions issued. |
| `SQ_INSTS_VMEM_WR` | Instr | Number of Vector Memory (VMEM) Write instructions (including FLAT) issued. |
| `SQ_INSTS_VMEM_RD` | Instr | Number of VMEM Read instructions (including FLAT) issued. |
| `SQ_INSTS_VMEM` | Instr | Number of VMEM instructions issued, including both FLAT and Buffer instructions. |
| `SQ_INSTS_SALU` | Instr | Number of SALU instructions issued. |
| `SQ_INSTS_SMEM` | Instr | Number of Scalar Memory (SMEM) instructions issued. |
| `SQ_INSTS_SMEM_NORM` | Instr | Number of SMEM instructions normalized to match `smem_level` issued. |
| `SQ_INSTS_FLAT` | Instr | Number of FLAT instructions issued. |
| `SQ_INSTS_FLAT_LDS_ONLY` | Instr | Number of FLAT instructions that read/write only from/to LDS issued. Works only if `EARLY_TA_DONE` is enabled. |
| `SQ_INSTS_LDS` | Instr | Number of Local Data Share (LDS) instructions issued (including FLAT). |
| `SQ_INSTS_GDS` | Instr | Number of Global Data Share (GDS) instructions issued. |
| `SQ_INSTS_EXP_GDS` | Instr | Number of EXP and GDS instructions excluding skipped export instructions issued. |
| `SQ_INSTS_BRANCH` | Instr | Number of Branch instructions issued. |
| `SQ_INSTS_SENDMSG` | Instr | Number of `SENDMSG` instructions including `s_endpgm` issued. |
| `SQ_INSTS_VSKIPPED*` | Instr | Number of vector instructions skipped. |
#### MFMA operation counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:-----|:----------------------------------------------|
| `SQ_INSTS_VALU_MFMA_MOPS_I8` | IOP | Number of 8-bit integer MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F16` | FLOP | Number of F16 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_BF16` | FLOP | Number of BF16 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F32` | FLOP | Number of F32 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F64` | FLOP | Number of F64 floating MFMA ops in the unit of 512 |
#### Level counters
:::{note}
All level counters must be followed by `SQ_ACCUM_PREV_HIRES` counter to measure average latency.
:::
| Hardware Counter | Unit | Definition |
|:-------------------|:-----|:-------------------------------------|
| `SQ_ACCUM_PREV` | Count | Accumulated counter sample value where accumulation takes place once every four cycles. |
| `SQ_ACCUM_PREV_HIRES` | Count | Accumulated counter sample value where accumulation takes place once every cycle. |
| `SQ_LEVEL_WAVES` | Waves | Number of inflight waves. To calculate the wave latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_WAVE`. |
| `SQ_INST_LEVEL_VMEM` | Instr | Number of inflight VMEM (including FLAT) instructions. To calculate the VMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_VMEM`. |
| `SQ_INST_LEVEL_SMEM` | Instr | Number of inflight SMEM instructions. To calculate the SMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_SMEM_NORM`. |
| `SQ_INST_LEVEL_LDS` | Instr | Number of inflight LDS (including FLAT) instructions. To calculate the LDS latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_LDS`. |
| `SQ_IFETCH_LEVEL` | Instr | Number of inflight instruction fetch requests from the cache. To calculate the instruction fetch latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_IFETCH`. |
#### Wavefront counters
| Hardware Counter | Unit | Definition |
|:--------------------|:-----|:----------------------------------------------------------------|
| `SQ_WAVES` | Waves | Number of wavefronts dispatched to Sequencers (SQs), including both new and restored wavefronts |
| `SQ_WAVES_SAVED*` | Waves | Number of context-saved waves |
| `SQ_WAVES_RESTORED*` | Waves | Number of context-restored waves sent to SQs |
| `SQ_WAVES_EQ_64` | Waves | Number of wavefronts with exactly 64 active threads sent to SQs |
| `SQ_WAVES_LT_64` | Waves | Number of wavefronts with less than 64 active threads sent to SQs |
| `SQ_WAVES_LT_48` | Waves | Number of wavefronts with less than 48 active threads sent to SQs |
| `SQ_WAVES_LT_32` | Waves | Number of wavefronts with less than 32 active threads sent to SQs |
| `SQ_WAVES_LT_16` | Waves | Number of wavefronts with less than 16 active threads sent to SQs |
#### Wavefront cycle counters
| Hardware Counter | Unit | Definition |
|:------------------------|:-------|:--------------------------------------------------------------------|
| `SQ_CYCLES` | Cycles | Clock cycles. |
| `SQ_BUSY_CYCLES` | Cycles | Number of cycles while SQ reports it to be busy. |
| `SQ_BUSY_CU_CYCLES` | Qcycles | Number of quad-cycles each CU is busy. |
| `SQ_VALU_MFMA_BUSY_CYCLES` | Cycles | Number of cycles the MFMA ALU is busy. |
| `SQ_WAVE_CYCLES` | Qcycles | Number of quad-cycles spent by waves in the CUs. |
| `SQ_WAIT_ANY` | Qcycles | Number of quad-cycles spent waiting for anything. |
| `SQ_WAIT_INST_ANY` | Qcycles | Number of quad-cycles spent waiting for any instruction to be issued. |
| `SQ_ACTIVE_INST_ANY` | Qcycles | Number of quad-cycles spent by each wave to work on an instruction. |
| `SQ_ACTIVE_INST_VMEM` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VMEM instruction. |
| `SQ_ACTIVE_INST_LDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an LDS instruction. |
| `SQ_ACTIVE_INST_VALU` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VALU instruction. |
| `SQ_ACTIVE_INST_SCA` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a SALU or SMEM instruction. |
| `SQ_ACTIVE_INST_EXP_GDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an EXPORT or GDS instruction. |
| `SQ_ACTIVE_INST_MISC` | Qcycles | Number of quad-cycles spent by the SQ instruction aribter to work on a BRANCH or `SENDMSG` instruction. |
| `SQ_ACTIVE_INST_FLAT` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a FLAT instruction. |
| `SQ_INST_CYCLES_VMEM_WR` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Write instructions. |
| `SQ_INST_CYCLES_VMEM_RD` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Read instructions. |
| `SQ_INST_CYCLES_SMEM` | Qcycles | Number of quad-cycles spent to execute scalar memory reads. |
| `SQ_INST_CYCLES_SALU` | Qcycles | Number of quad-cycles spent to execute non-memory read scalar operations. |
| `SQ_THREAD_CYCLES_VALU` | Cycles | Number of thread-cycles spent to execute VALU operations. This is similar to `INST_CYCLES_VALU` but multiplied by the number of active threads. |
| `SQ_WAIT_INST_LDS` | Qcycles | Number of quad-cycles spent waiting for LDS instruction to be issued. |
#### LDS counters
| Hardware Counter | Unit | Definition |
|:--------------------------|:------|:--------------------------------------------------------|
| `SQ_LDS_ATOMIC_RETURN` | Cycles | Number of atomic return cycles in LDS |
| `SQ_LDS_BANK_CONFLICT` | Cycles | Number of cycles LDS is stalled by bank conflicts |
| `SQ_LDS_ADDR_CONFLICT*` | Cycles | Number of cycles LDS is stalled by address conflicts |
| `SQ_LDS_UNALIGNED_STALL*` | Cycles | Number of cycles LDS is stalled processing flat unaligned load/store ops |
| `SQ_LDS_MEM_VIOLATIONS*` | Count | Number of threads that have a memory violation in the LDS |
| `SQ_LDS_IDX_ACTIVE` | Cycles | Number of cycles LDS is used for indexed operations |
#### Miscellaneous counters
| Hardware Counter | Unit | Definition |
|:--------------------------|:------|:--------------------------------------------------------|
| `SQ_IFETCH` | Count | Number of instruction fetch requests from `L1I` cache, in 32-byte width |
| `SQ_ITEMS` | Threads | Number of valid items per wave |
### L1I and sL1D cache counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:------|:----------------------------------------------------------------|
| `SQC_ICACHE_REQ` | Req | Number of `L1I` cache requests |
| `SQC_ICACHE_HITS` | Count | Number of `L1I` cache hits |
| `SQC_ICACHE_MISSES` | Count | Number of non-duplicate `L1I` cache misses including uncached requests |
| `SQC_ICACHE_MISSES_DUPLICATE` | Count | Number of duplicate `L1I` cache misses whose previous lookup miss on the same cache line is not fulfilled yet |
| `SQC_DCACHE_REQ` | Req | Number of `sL1D` cache requests |
| `SQC_DCACHE_INPUT_VALID_READYB` | Cycles | Number of cycles while SQ input is valid but sL1D cache is not ready |
| `SQC_DCACHE_HITS` | Count | Number of `sL1D` cache hits |
| `SQC_DCACHE_MISSES` | Count | Number of non-duplicate `sL1D` cache misses including uncached requests |
| `SQC_DCACHE_MISSES_DUPLICATE` | Count | Number of duplicate `sL1D` cache misses |
| `SQC_DCACHE_REQ_READ_1` | Req | Number of constant cache read requests in a single DW |
| `SQC_DCACHE_REQ_READ_2` | Req | Number of constant cache read requests in two DW |
| `SQC_DCACHE_REQ_READ_4` | Req | Number of constant cache read requests in four DW |
| `SQC_DCACHE_REQ_READ_8` | Req | Number of constant cache read requests in eight DW |
| `SQC_DCACHE_REQ_READ_16` | Req | Number of constant cache read requests in 16 DW |
| `SQC_DCACHE_ATOMIC*` | Req | Number of atomic requests |
| `SQC_TC_REQ` | Req | Number of TC requests that were issued by instruction and constant caches |
| `SQC_TC_INST_REQ` | Req | Number of instruction requests to the L2 cache |
| `SQC_TC_DATA_READ_REQ` | Req | Number of data Read requests to the L2 cache |
| `SQC_TC_DATA_WRITE_REQ*` | Req | Number of data write requests to the L2 cache |
| `SQC_TC_DATA_ATOMIC_REQ*` | Req | Number of data atomic requests to the L2 cache |
| `SQC_TC_STALL*` | Cycles | Number of cycles while the valid requests to the L2 cache are stalled |
### Vector L1 cache subsystem
The vector L1 cache subsystem counters are further classified into Texture Addressing Unit (TA), Texture Data Unit (TD), vector L1D cache or Texture Cache per Pipe (TCP), and Texture Cache Arbiter (TCA) counters.
#### TA counters
| Hardware Counter | Unit | Definition |
|:--------------------------------|:------|:------------------------------------------------|
| `TA_TA_BUSY[n]` | Cycles | TA busy cycles. Value range for n: [0-15]. |
| `TA_TOTAL_WAVEFRONTS[n]` | Instr | Number of wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_WAVEFRONTS[n]` | Instr | Number of buffer wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_READ_WAVEFRONTS[n]` | Instr | Number of buffer read wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_WRITE_WAVEFRONTS[n]` | Instr | Number of buffer write wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_ATOMIC_WAVEFRONTS[n]` | Instr | Number of buffer atomic wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_TOTAL_CYCLES[n]` | Cycles | Number of buffer cycles (including read and write) issued to TC. Value range for n: [0-15]. |
| `TA_BUFFER_COALESCED_READ_CYCLES[n]` | Cycles | Number of coalesced buffer read cycles issued to TC. Value range for n: [0-15]. |
| `TA_BUFFER_COALESCED_WRITE_CYCLES[n]` | Cycles | Number of coalesced buffer write cycles issued to TC. Value range for n: [0-15]. |
| `TA_ADDR_STALLED_BY_TC_CYCLES[n]` | Cycles | Number of cycles TA address path is stalled by TC. Value range for n: [0-15]. |
| `TA_DATA_STALLED_BY_TC_CYCLES[n]` | Cycles | Number of cycles TA data path is stalled by TC. Value range for n: [0-15]. |
| `TA_ADDR_STALLED_BY_TD_CYCLES[n]` | Cycles | Number of cycles TA address path is stalled by TD. Value range for n: [0-15]. |
| `TA_FLAT_WAVEFRONTS[n]` | Instr | Number of flat opcode wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_READ_WAVEFRONTS[n]` | Instr | Number of flat opcode read wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_WRITE_WAVEFRONTS[n]` | Instr | Number of flat opcode write wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_ATOMIC_WAVEFRONTS[n]` | Instr | Number of flat opcode atomic wavefronts processed by TA. Value range for n: [0-15]. |
#### TD counters
| Hardware Counter | Unit | Definition |
|:------------------------|:-----|:---------------------------------------------------|
| `TD_TD_BUSY[n]` | Cycle | TD busy cycles while it is processing or waiting for data. Value range for n: [0-15]. |
| `TD_TC_STALL[n]` | Cycle | Number of cycles TD is stalled waiting for TC data. Value range for n: [0-15]. |
| `TD_SPI_STALL[n]` | Cycle | Number of cycles TD is stalled by SPI. Value range for n: [0-15]. |
| `TD_LOAD_WAVEFRONT[n]` | Instr |Number of wavefront instructions (read/write/atomic). Value range for n: [0-15]. |
| `TD_STORE_WAVEFRONT[n]` | Instr | Number of write wavefront instructions. Value range for n: [0-15].|
| `TD_ATOMIC_WAVEFRONT[n]` | Instr | Number of atomic wavefront instructions. Value range for n: [0-15]. |
| `TD_COALESCABLE_WAVEFRONT[n]` | Instr | Number of coalescable wavefronts according to TA. Value range for n: [0-15]. |
#### TCP counters
| Hardware Counter | Unit | Definition |
|:-----------------------------------|:------|:----------------------------------------------------------|
| `TCP_GATE_EN1[n]` | Cycles | Number of cycles vL1D interface clocks are turned on. Value range for n: [0-15]. |
| `TCP_GATE_EN2[n]` | Cycles | Number of cycles vL1D core clocks are turned on. Value range for n: [0-15]. |
| `TCP_TD_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TD stalls vL1D. Value range for n: [0-15]. |
| `TCP_TCR_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TCR stalls vL1D. Value range for n: [0-15]. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a read. Value range for n: [0-15]. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a write. Value range for n: [0-15]. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on an atomic. Value range for n: [0-15]. |
| `TCP_PENDING_STALL_CYCLES[n]` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache. Value range for n: [0-15]. |
| `TCP_TCP_TA_DATA_STALL_CYCLES` | Cycles | Number of cycles TCP stalls TA data interface. |
| `TCP_TA_TCP_STATE_READ[n]` | Req | Number of state reads. Value range for n: [0-15]. |
| `TCP_VOLATILE[n]` | Req | Number of L1 volatile pixels/buffers from TA. Value range for n: [0-15]. |
| `TCP_TOTAL_ACCESSES[n]` | Req | Number of vL1D accesses. Equals `TCP_PERF_SEL_TOTAL_READ`+`TCP_PERF_SEL_TOTAL_NONREAD`. Value range for n: [0-15]. |
| `TCP_TOTAL_READ[n]` | Req | Number of vL1D read accesses. Equals `TCP_PERF_SEL_TOTAL_HIT_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_EVICT_READ`. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITE[n]` | Req | Number of vL1D write accesses. `Equals TCP_PERF_SEL_TOTAL_MISS_LRU_WRITE`+ `TCP_PERF_SEL_TOTAL_MISS_EVICT_WRITE`. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITH_RET[n]` | Req | Number of vL1D atomic requests with return. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET[n]` | Req | Number of vL1D atomic without return. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES[n]` | Count | Total number of vL1D writebacks and invalidates. Equals `TCP_PERF_SEL_TOTAL_WBINVL1`+ `TCP_PERF_SEL_TOTAL_WBINVL1_VOL`+ `TCP_PERF_SEL_CP_TCP_INVALIDATE`+ `TCP_PERF_SEL_SQ_TCP_INVALIDATE_VOL`. Value range for n: [0-15]. |
| `TCP_UTCL1_REQUEST[n]` | Req | Number of address translation requests to UTCL1. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_HIT[n]` | Req | Number of UTCL1 translation hits. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_MISS[n]` | Req | Number of UTCL1 translation misses. Value range for n: [0-15]. |
| `TCP_UTCL1_PERMISSION_MISS[n]` | Req | Number of UTCL1 permission misses. Value range for n: [0-15]. |
| `TCP_TOTAL_CACHE_ACCESSES[n]` | Req | Number of vL1D cache accesses including hits and misses. Value range for n: [0-15]. |
| `TCP_TCP_LATENCY[n]` | Cycles | Accumulated wave access latency to vL1D over all wavefronts. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ[n]` | Req | Number of read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ[n]` | Req | Number of write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ[n]` | Req | Number of atomic requests to L2 cache with return. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]` | Req | Number of atomic requests to L2 cache without return. Value range for n: [0-15]. |
| `TCP_TCC_NC_READ_REQ[n]` | Req | Number of NC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_READ_REQ[n]` | Req | Number of UC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_READ_REQ[n]` | Req | Number of CC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_READ_REQ[n]` | Req | Number of RW read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_WRITE_REQ[n]` | Req | Number of NC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_WRITE_REQ[n]` | Req | Number of UC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_WRITE_REQ[n]` | Req | Number of CC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_WRITE_REQ[n]` | Req | Number of RW write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_ATOMIC_REQ[n]` | Req | Number of NC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_ATOMIC_REQ[n]` | Req | Number of UC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_ATOMIC_REQ[n]` | Req | Number of CC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_ATOMIC_REQ[n]` | Req | Number of RW atomic requests to L2 cache. Value range for n: [0-15]. |
#### TCA counters
| Hardware Counter | Unit | Definition |
|:----------------|:------|:------------------------------------------|
| `TCA_CYCLE[n]` | Cycles | Number of TCA cycles. Value range for n: [0-31]. |
| `TCA_BUSY[n]` | Cycles | Number of cycles TCA has a pending request. Value range for n: [0-31]. |
### L2 cache access counters
L2 Cache is also known as Texture Cache per Channel (TCC).
| Hardware Counter | Unit | Definition |
|:--------------------------------|:------|:-------------------------------------------------------------|
| `TCC_CYCLE[n]` |Cycle | Number of L2 cache free-running clocks. Value range for n: [0-31]. |
| `TCC_BUSY[n]` |Cycle | Number of L2 cache busy cycles. Value range for n: [0-31]. |
| `TCC_REQ[n]` |Req | Number of L2 cache requests of all types. This is measured at the tag block. This may be more than the number of requests arriving at the TCC, but it is a good indication of the total amount of work that needs to be performed. Value range for n: [0-31]. |
| `TCC_STREAMING_REQ[n]` |Req | Number of L2 cache streaming requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_NC_REQ[n]` |Req | Number of NC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_UC_REQ[n]` |Req | Number of UC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_CC_REQ[n]` |Req | Number of CC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_RW_REQ[n]` |Req | Number of RW requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_PROBE[n]` |Req | Number of probe requests. Value range for n: [0-31]. |
| `TCC_PROBE_ALL[n]` |Req | Number of external probe requests with `EA_TCC_preq_all`== 1. Value range for n: [0-31]. |
| `TCC_READ[n]` |Req | Number of L2 cache read requests. This includes compressed reads but not metadata reads. Value range for n: [0-31]. |
| `TCC_WRITE[n]` |Req | Number of L2 cache write requests. Value range for n: [0-31]. |
| `TCC_ATOMIC[n]` |Req | Number of L2 cache atomic requests of all types. Value range for n: [0-31]. |
| `TCC_HIT[n]` |Req | Number of L2 cache hits. Value range for n: [0-31]. |
| `TCC_MISS[n]` |Req | Number of L2 cache misses. Value range for n: [0-31]. |
| `TCC_WRITEBACK[n]` |Req | Number of lines written back to the main memory, including writebacks of dirty lines and uncached write/atomic requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ[n]` |Req | Number of 32-byte and 64-byte transactions going over the `TC_EA_wrreq` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_64B[n]` |Req | Total number of 64-byte transactions (write or `CMPSWAP`) going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_WR_UNCACHED_32B[n]` |Req | Number of 32-byte write/atomic going over the `TC_EA_wrreq` interface due to uncached traffic. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. Value range for n: [0-31].|
| `TCC_EA_WRREQ_STALL[n]` | Cycles | Number of cycles a write request is stalled. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of IO credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of GMI credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of DRAM credits. Value range for n: [0-31]. |
| `TCC_TOO_MANY_EA_WRREQS_STALL[n]` | Cycles | Number of cycles the L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_LEVEL[n]` | Req | The accumulated number of EA write requests in flight. This is primarily intended to measure average EA write latency. Average write latency = `TCC_PERF_SEL_EA_WRREQ_LEVEL`/`TCC_PERF_SEL_EA_WRREQ`. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC[n]` | Req | Number of 32-byte or 64-byte atomic requests going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC_LEVEL[n]` | Req | The accumulated number of EA atomic requests in flight. This is primarily intended to measure average EA atomic latency. Average atomic latency = `TCC_PERF_SEL_EA_WRREQ_ATOMIC_LEVEL`/`TCC_PERF_SEL_EA_WRREQ_ATOMIC`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ[n]` | Req | Number of 32-byte or 64-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_32B[n]` | Req | Number of 32-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RD_UNCACHED_32B[n]` | Req | Number of 32-byte EA reads due to uncached traffic. A 64-byte request is counted as 2. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of IO credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of GMI credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of DRAM credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_LEVEL[n]` | Req | The accumulated number of EA read requests in flight. This is primarily intended to measure average EA read latency. Average read latency = `TCC_PERF_SEL_EA_RDREQ_LEVEL`/`TCC_PERF_SEL_EA_RDREQ`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA read requests to High Bandwidth Memory (HBM). Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA write requests to HBM. Value range for n: [0-31]. |
| `TCC_TAG_STALL[n]` | Cycles | Number of cycles the normal request pipeline in the tag is stalled for any reason. Normally, stalls of this nature are measured exactly at one point in the pipeline however in case of this counter, probes can stall the pipeline at a variety of places and there is no single point that can reasonably measure the total stalls accurately. Value range for n: [0-31]. |
| `TCC_NORMAL_WRITEBACK[n]` | Req | Number of writebacks due to requests that are not writeback requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_WB_WRITEBACK[n]` | Req | Number of writebacks due to all `TC_OP` writeback requests. Value range for n: [0-31]. |
| `TCC_NORMAL_EVICT[n]` | Req | Number of evictions due to requests that are not invalidate or probe requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_INV_EVICT[n]` | Req | Number of evictions due to all `TC_OP` invalidate requests. Value range for n: [0-31]. |
## MI200 derived metrics list
| Derived Metric | Description |
|:----------------|:-------------------------------------------------------------------------------------|
| `ALUStalledByLDS` | Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready. Reduce this by reducing the LDS bank conflicts or the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). |
| `FetchSize` | Total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `FlatLDSInsts` | Average number of FLAT instructions that read from or write to LDS, executed per work item (affected by flow control). |
| `FlatVMemInsts` | Average number of FLAT instructions that read from or write to the video memory, executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch. |
| `GDSInsts` | Average number of GDS read/write instructions executed per work item (affected by flow control). |
| `GPUBusy` | Percentage of time GPU is busy. |
| `L2CacheHit` | Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal). |
| `LDSBankConflict` | Percentage of GPU time LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). |
| `LDSInsts` | Average number of LDS read/write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS. |
| `MemUnitBusy` | Percentage of GPU time the memory unit is active. The result includes the stall time (`MemUnitStalled`). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). |
| `MemUnitStalled` | Percentage of GPU time the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). |
| `MemWrites32B` | Total number of effective 32B write transactions to the memory. |
| `SALUBusy` | Percentage of GPU time scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `SALUInsts` | Average number of scalar ALU instructions executed per work item (affected by flow control). |
| `SFetchInsts` | Average number of scalar fetch instructions from the video memory executed per work item (affected by flow control). |
| `TA_ADDR_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA address path is stalled by TC, over all TA instances. |
| `TA_ADDR_STALLED_BY_TD_CYCLES_sum` | Total number of cycles TA address path is stalled by TD, over all TA instances. |
| `TA_BUFFER_WAVEFRONTS_sum` | Total number of buffer wavefronts processed by all TA instances. |
| `TA_BUFFER_READ_WAVEFRONTS_sum` | Total number of buffer read wavefronts processed by all TA instances. |
| `TA_BUFFER_WRITE_WAVEFRONTS_sum` | Total number of buffer write wavefronts processed by all TA instances. |
| `TA_BUFFER_ATOMIC_WAVEFRONTS_sum` | Total number of buffer atomic wavefronts processed by all TA instances. |
| `TA_BUFFER_TOTAL_CYCLES_sum` | Total number of buffer cycles (including read and write) issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_READ_CYCLES_sum` | Total number of coalesced buffer read cycles issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_WRITE_CYCLES_sum` | Total number of coalesced buffer write cycles issued to TC by all TA instances. |
| `TA_BUSY_avr` | Average number of busy cycles over all TA instances. |
| `TA_BUSY_max` | Maximum number of TA busy cycles over all TA instances. |
| `TA_BUSY_min` | Minimum number of TA busy cycles over all TA instances. |
| `TA_DATA_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA data path is stalled by TC, over all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Sum of flat opcode reads processed by all TA instances. |
| `TA_FLAT_WRITE_WAVEFRONTS_sum` | Sum of flat opcode writes processed by all TA instances. |
| `TA_FLAT_WAVEFRONTS_sum` | Total number of flat opcode wavefronts processed by all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Total number of flat opcode read wavefronts processed by all TA instances. |
| `TA_FLAT_ATOMIC_WAVEFRONTS_sum` | Total number of flat opcode atomic wavefronts processed by all TA instances. |
| `TA_TA_BUSY_sum` | Total number of TA busy cycles over all TA instances. |
| `TA_TOTAL_WAVEFRONTS_sum` | Total number of wavefronts processed by all TA instances. |
| `TCA_BUSY_sum` | Total number of cycles TCA has a pending request, over all TCA instances. |
| `TCA_CYCLE_sum` | Total number of cycles over all TCA instances. |
| `TCC_ALL_TC_OP_WB_WRITEBACK_sum` | Total number of writebacks due to all TC_OP writeback requests, over all TCC instances. |
| `TCC_ALL_TC_OP_INV_EVICT_sum` | Total number of evictions due to all TC_OP invalidate requests, over all TCC instances. |
| `TCC_ATOMIC_sum` | Total number of L2 cache atomic requests of all types, over all TCC instances. |
| `TCC_BUSY_avr` | Average number of L2 cache busy cycles, over all TCC instances. |
| `TCC_BUSY_sum` | Total number of L2 cache busy cycles, over all TCC instances. |
| `TCC_CC_REQ_sum` | Total number of CC requests over all TCC instances. |
| `TCC_CYCLE_sum` | Total number of L2 cache free running clocks, over all TCC instances. |
| `TCC_EA_WRREQ_sum` | Total number of 32-byte and 64-byte transactions going over the TC_EA_wrreq interface, over all TCC instances. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. |
| `TCC_EA_WRREQ_64B_sum` | Total number of 64-byte transactions (write or `CMPSWAP`) going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_WR_UNCACHED_32B_sum` | Total Number of 32-byte write/atomic going over the TC_EA_wrreq interface due to uncached traffic, over all TCC instances. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. |
| `TCC_EA_WRREQ_STALL_sum` | Total Number of cycles a write request is stalled, over all instances. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of IO credits, over all instances. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of GMI credits, over all instances. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of DRAM credits, over all instances. |
| `TCC_EA_WRREQ_LEVEL_sum` | Total number of EA write requests in flight over all TCC instances. |
| `TCC_EA_RDREQ_LEVEL_sum` | Total number of EA read requests in flight over all TCC instances. |
| `TCC_EA_ATOMIC_sum` | Total Number of 32-byte or 64-byte atomic requests going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_ATOMIC_LEVEL_sum` | Total number of EA atomic requests in flight, over all TCC instances. |
| `TCC_EA_RDREQ_sum` | Total number of 32-byte or 64-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RDREQ_32B_sum` | Total number of 32-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RD_UNCACHED_32B_sum` | Total number of 32-byte EA reads due to uncached traffic, over all TCC instances. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of IO credits, over all TCC instances. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of GMI credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of DRAM credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA read requests to HBM, over all TCC instances. |
| `TCC_EA_WRREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA write requests to HBM, over all TCC instances. |
| `TCC_HIT_sum` | Total number of L2 cache hits over all TCC instances. |
| `TCC_MISS_sum` | Total number of L2 cache misses over all TCC instances. |
| `TCC_NC_REQ_sum` | Total number of NC requests over all TCC instances. |
| `TCC_NORMAL_WRITEBACK_sum` | Total number of writebacks due to requests that are not writeback requests, over all TCC instances. |
| `TCC_NORMAL_EVICT_sum` | Total number of evictions due to requests that are not invalidate or probe requests, over all TCC instances. |
| `TCC_PROBE_sum` | Total number of probe requests over all TCC instances. |
| `TCC_PROBE_ALL_sum` | Total number of external probe requests with EA_TCC_preq_all== 1, over all TCC instances. |
| `TCC_READ_sum` | Total number of L2 cache read requests (including compressed reads but not metadata reads) over all TCC instances. |
| `TCC_REQ_sum` | Total number of all types of L2 cache requests over all TCC instances. |
| `TCC_RW_REQ_sum` | Total number of RW requests over all TCC instances. |
| `TCC_STREAMING_REQ_sum` | Total number of L2 cache streaming requests over all TCC instances. |
| `TCC_TAG_STALL_sum` | Total number of cycles the normal request pipeline in the tag is stalled for any reason, over all TCC instances. |
| `TCC_TOO_MANY_EA_WRREQS_STALL_sum` | Total number of cycles L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests, over all TCC instances. |
| `TCC_UC_REQ_sum` | Total number of UC requests over all TCC instances. |
| `TCC_WRITE_sum` | Total number of L2 cache write requests over all TCC instances. |
| `TCC_WRITEBACK_sum` | Total number of lines written back to the main memory including writebacks of dirty lines and uncached write/atomic requests, over all TCC instances. |
| `TCC_WRREQ_STALL_max` | Maximum number of cycles a write request is stalled, over all TCC instances. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on an atomic, over all TCP instances. |
| `TCP_GATE_EN1_sum` | Total number of cycles vL1D interface clocks are turned on, over all TCP instances. |
| `TCP_GATE_EN2_sum` | Total number of cycles vL1D core clocks are turned on, over all TCP instances. |
| `TCP_PENDING_STALL_CYCLES_sum` | Total number of cycles vL1D cache is stalled due to data pending from L2 Cache, over all TCP instances. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a read, over all TCP instances. |
| `TCP_TA_TCP_STATE_READ_sum` | Total number of state reads by all TCP instances. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ_sum` | Total number of atomic requests to L2 cache with return, over all TCP instances. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum` | Total number of atomic requests to L2 cache without return, over all TCP instances. |
| `TCP_TCC_CC_READ_REQ_sum` | Total number of CC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_WRITE_REQ_sum` | Total number of CC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_ATOMIC_REQ_sum` | Total number of CC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_READ_REQ_sum` | Total number of NC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_WRITE_REQ_sum` | Total number of NC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_ATOMIC_REQ_sum` | Total number of NC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_READ_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return for all TCP instances. |
| `TCP_TCC_READ_REQ_sum` | Total number of read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_READ_REQ_sum` | Total number of RW read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_WRITE_REQ_sum` | Total number of RW write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_ATOMIC_REQ_sum` | Total number of RW atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_READ_REQ_sum` | Total number of UC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_WRITE_REQ_sum` | Total number of UC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_ATOMIC_REQ_sum` | Total number of UC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_WRITE_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return for all TCP instances. |
| `TCP_TCC_WRITE_REQ_sum` | Total number of write requests to L2 cache, over all TCP instances. |
| `TCP_TCP_LATENCY_sum` | Total wave access latency to vL1D over all wavefronts for all TCP instances. |
| `TCP_TCR_TCP_STALL_CYCLES_sum` | Total number of cycles TCR stalls vL1D, over all TCP instances. |
| `TCP_TD_TCP_STALL_CYCLES_sum` | Total number of cycles TD stalls vL1D, over all TCP instances. |
| `TCP_TOTAL_ACCESSES_sum` | Total number of vL1D accesses, over all TCP instances. |
| `TCP_TOTAL_READ_sum` | Total number of vL1D read accesses, over all TCP instances. |
| `TCP_TOTAL_WRITE_sum` | Total number of vL1D write accesses, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITH_RET_sum` | Total number of vL1D atomic requests with return, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET_sum` | Total number of vL1D atomic requests without return, over all TCP instances. |
| `TCP_TOTAL_CACHE_ACCESSES_sum` | Total number of vL1D cache accesses (including hits and misses) by all TCP instances. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES_sum` | Total number of vL1D writebacks and invalidates, over all TCP instances. |
| `TCP_UTCL1_PERMISSION_MISS_sum` | Total number of UTCL1 permission misses by all TCP instances. |
| `TCP_UTCL1_REQUEST_sum` | Total number of address translation requests to UTCL1 by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_MISS_sum` | Total number of UTCL1 translation misses by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_HIT_sum` | Total number of UTCL1 translation hits by all TCP instances. |
| `TCP_VOLATILE_sum` | Total number of L1 volatile pixels/buffers from TA, over all TCP instances. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a write, over all TCP instances. |
| `TD_ATOMIC_WAVEFRONT_sum` | Total number of atomic wavefront instructions, over all TD instances. |
| `TD_COALESCABLE_WAVEFRONT_sum` | Total number of coalescable wavefronts according to TA, over all TD instances. |
| `TD_LOAD_WAVEFRONT_sum` | Total number of wavefront instructions (read/write/atomic), over all TD instances. |
| `TD_SPI_STALL_sum` | Total number of cycles TD is stalled by SPI, over all TD instances. |
| `TD_STORE_WAVEFRONT_sum` | Total number of write wavefront instructions, over all TD instances. |
| `TD_TC_STALL_sum` | Total number of cycles TD is stalled waiting for TC data, over all TD instances. |
| `TD_TD_BUSY_sum` | Total number of TD busy cycles while it is processing or waiting for data, over all TD instances. |
| `VALUBusy` | Percentage of GPU time vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `VALUInsts` | Average number of vector ALU instructions executed per work item (affected by flow control). |
| `VALUUtilization` | Percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). |
| `VFetchInsts` | Average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory. |
| `VWriteInsts` | Average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory. |
| `Wavefronts` | Total wavefronts. |
| `WRITE_REQ_32B` | Total number of 32-byte effective memory writes. |
| `WriteSize` | Total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `WriteUnitStalled` | Percentage of GPU time the write unit is stalled. Value range: 0% to 100% (bad). |
## Abbreviations
| Abbreviation | Meaning |
|:------------|:--------------------------------------------------------------------------------|
| `ALU` | Arithmetic Logic Unit |
| `Arb` | Arbiter |
| `BF16` | Brain Floating Point - 16 bits |
| `CC` | Coherently Cached |
| `CP` | Command Processor |
| `CPC` | Command Processor - Compute |
| `CPF` | Command Processor - Fetcher |
| `CS` | Compute Shader |
| `CSC` | Compute Shader Controller |
| `CSn` | Compute Shader, the n-th pipe |
| `CU` | Compute Unit |
| `DW` | 32-bit Data Word, DWORD |
| `EA` | Efficiency Arbiter |
| `F16` | Half Precision Floating Point |
| `F32` | Full Precision Floating Point |
| `FLAT` | FLAT instructions allow read/write/atomic access to a generic memory address pointer, which can resolve to any of the following physical memories:<br>. Global Memory<br>. Scratch ("private")<br>. LDS ("shared")<br>. Invalid - MEM_VIOL TrapStatus |
| `FMA` | Fused Multiply Add |
| `GDS` | Global Data Share |
| `GRBM` | Graphics Register Bus Manager |
| `HBM` | High Bandwidth Memory |
| `Instr` | Instructions |
| `IOP` | Integer Operation |
| `L2` | Level-2 Cache |
| `LDS` | Local Data Share |
| `ME1` | Micro Engine, running packet processing firmware on CPC |
| `MFMA` | Matrix Fused Multiply Add |
| `NC` | Noncoherently Cached |
| `RW` | Coherently Cached with Write |
| `SALU` | Scalar ALU |
| `SGPR` | Scalar General Purpose Register |
| `SIMD` | Single Instruction Multiple Data |
| `sL1D` | Scalar Level-1 Data Cache |
| `SMEM` | Scalar Memory |
| `SPI` | Shader Processor Input |
| `SQ` | Sequencer |
| `TA` | Texture Addressing Unit |
| `TC` | Texture Cache |
| `TCA` | Texture Cache Arbiter |
| `TCC` | Texture Cache per Channel, known as L2 Cache |
| `TCIU` | Texture Cache Interface Unit (interface between CP and the memory system) |
| `TCP` | Texture Cache per Pipe, known as vector L1 Cache |
| `TCR` | Texture Cache Router |
| `TD` | Texture Data Unit |
| `UC` | Uncached |
| `UTCL1` | Unified Translation Cache - Level 1 |
| `UTCL2` | Unified Translation Cache - Level 2 |
| `VALU` | Vector ALU |
| `VGPR` | Vector General Purpose Register |
| `vL1D` | Vector Level -1 Data Cache |
| `VMEM` | Vector Memory |

View File

@@ -0,0 +1,133 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD Instinct MI250 microarchitecture">
<meta name="keywords" content="Instinct, MI250, microarchitecture, AMD, ROCm">
</head>
# AMD Instinct™ MI250 microarchitecture
The microarchitecture of the AMD Instinct MI250 accelerators is based on the
AMD CDNA 2 architecture that targets compute applications such as HPC,
artificial intelligence (AI), and machine learning (ML) and that run on
everything from individual servers to the worlds largest exascale
supercomputers. The overall system architecture is designed for extreme
scalability and compute performance.
The following image shows the components of a single Graphics Compute Die (GCD) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™
interfaces and their physical links that are used to connect the GPU die to the
other system-level components of the node (see also Section 2.2). Both
interfaces can drive four AMD Infinity Fabric links. One of the AMD Infinity
Fabric links of the controller at the bottom can be configured as a PCIe link.
Each of the AMD Infinity Fabric links between GPUs can run at up to 25 GT/sec,
which correlates to a peak transfer bandwidth of 50 GB/sec for a 16-wide link (
two bytes per transaction). Section 2.2 has more details on the number of AMD
Infinity Fabric links and the resulting transfer rates between the system-level
components.
To the left and the right are memory controllers that attach the High Bandwidth
Memory (HBM) modules to the GCD. AMD Instinct MI250 GPUs use HBM2e, which offers
a peak memory bandwidth of 1.6 TB/sec per GCD.
The execution units of the GPU are depicted in the following image as Compute
Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
subdivided into four SIMD units that process SIMD instructions of 16 data
elements per instruction (for the FP64 data type). This enables the CU to
process 64 work items (a so-called “wavefront”) at a peak clock frequency of 1.7
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 45.3
TFLOPS for vector instructions. The MI250 compute units also provide specialized
execution units (also called matrix cores), which are geared toward executing
matrix operations like matrix-matrix multiplications. For FP64, the peak
performance of these units amounts to 90.5 TFLOPS.
![Structure of a single GCD in the AMD Instinct MI250 accelerator.](../../data/conceptual/gpu-arch/image001.png "Structure of a single GCD in the AMD Instinct MI250 accelerator.")
```{list-table} Peak-performance capabilities of the MI250 OAM for different data types.
:header-rows: 1
:name: mi250-perf-table
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
*
- Matrix FP64
- 256
- 90.5
*
- Vector FP64
- 128
- 45.3
*
- Matrix FP32
- 256
- 90.5
*
- Packed FP32
- 256
- 90.5
*
- Vector FP32
- 128
- 45.3
*
- Matrix FP16
- 1024
- 362.1
*
- Matrix BF16
- 1024
- 362.1
*
- Matrix INT8
- 1024
- 362.1
```
The above table summarizes the aggregated peak performance of the AMD
Instinct MI250 OCP Open Accelerator Modules (OAM, OCP is short for Open Compute
Platform) and its two GCDs for different data types and execution units. The
middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction
is being retired in each clock cycle. The third column lists the theoretical
peak performance of the OAM module. The theoretical aggregated peak memory
bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).
![Dual-GCD architecture of the AMD Instinct MI250 accelerators](../../data/conceptual/gpu-arch/image002.png "Dual-GCD architecture of the AMD Instinct MI250 accelerators")
The following image shows the block diagram of an OAM package that consists
of two GCDs, each of which constitutes one GPU device in the system. The two
GCDs in the package are connected via four AMD Infinity Fabric links running at
a theoretical peak rate of 25 GT/sec, giving 200 GB/sec peak transfer bandwidth
between the two GCDs of an OAM, or a bidirectional peak transfer bandwidth of
400 GB/sec for the same.
## Node-level architecture
The following image shows the node-level architecture of a system that is
based on the AMD Instinct MI250 accelerator. The MI250 OAMs attach to the host
system via PCIe Gen 4 x16 links (yellow lines). Each GCD maintains its own PCIe
x16 link to the host part of the system. Depending on the server platform, the
GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch
. Note that some platforms may offer an x8 interface to the GCDs, which reduces
the available host-to-GPU bandwidth.
![Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor](../../data/conceptual/gpu-arch/image003.png "Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor")
The preceding image shows the node-level architecture of a system with AMD
EPYC processors in a dual-socket configuration and four AMD Instinct MI250
accelerators. The MI250 OAMs attach to the host processors system via PCIe Gen 4
x16 links (yellow lines). Depending on the system design, a PCIe switch may
exist to make more PCIe lanes available for additional components like network
interfaces and/or storage devices. Each GCD maintains its own PCIe x16 link to
the host part of the system or to the PCIe switch. Please note, some platforms
may offer an x8 interface to the GCDs, which will reduce the available
host-to-GPU bandwidth.
Between the OAMs and their respective GCDs, a peer-to-peer (P2P) network allows
for direct data exchange between the GPU dies via AMD Infinity Fabric links (
black, green, and red lines). Each of these 16-wide links connects to one of the
two GPU dies in the MI250 OAM and operates at 25 GT/sec, which corresponds to a
theoretical peak transfer rate of 50 GB/sec per link (or 100 GB/sec
bidirectional peak transfer bandwidth). The GCD pairs 2 and 6 as well as GCDs 0
and 4 connect via two XGMI links, which is indicated by the thicker red line in
the preceding image.

View File

@@ -0,0 +1,116 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="GPU isolation techniques">
<meta name="keywords" content="GPU isolation techniques, UUID, universally unique identifier,
environment variables, virtual machines, AMD, ROCm">
</head>
# GPU isolation techniques
Restricting the access of applications to a subset of GPUs, aka isolating
GPUs allows users to hide GPU resources from programs. The programs by default
will only use the "exposed" GPUs ignoring other (hidden) GPUs in the system.
There are multiple ways to achieve isolation of GPUs in the ROCm software stack,
differing in which applications they apply to and the security they provide.
This page serves as an overview of the techniques.
## Environment variables
The runtimes in the ROCm software stack read these environment variables to
select the exposed or default device to present to applications using them.
Environment variables shouldn't be used for isolating untrusted applications,
as an application can reset them before initializing the runtime.
### `ROCR_VISIBLE_DEVICES`
A list of device indices or {abbr}`UUID (universally unique identifier)`s
that will be exposed to applications.
Runtime
: ROCm Software Runtime. Applies to all applications using the user mode ROCm
software stack.
```{code-block} shell
:caption: Example to expose the 1. device and a device based on UUID.
export ROCR_VISIBLE_DEVICES="0,GPU-DEADBEEFDEADBEEF"
```
### `GPU_DEVICE_ORDINAL`
Devices indices exposed to OpenCL and HIP applications.
Runtime
: ROCm Common Language Runtime (`ROCclr`). Applies to applications and runtimes
using the `ROCclr` abstraction layer including HIP and OpenCL applications.
```{code-block} shell
:caption: Example to expose the 1. and 3. device in the system.
export GPU_DEVICE_ORDINAL="0,2"
```
(hip_visible_devices)=
### `HIP_VISIBLE_DEVICES`
Device indices exposed to HIP applications.
Runtime: HIP runtime. Applies only to applications using HIP on the AMD platform.
```{code-block} shell
:caption: Example to expose the 1. and 3. devices in the system.
export HIP_VISIBLE_DEVICES="0,2"
```
### `CUDA_VISIBLE_DEVICES`
Provided for CUDA compatibility, has the same effect as `HIP_VISIBLE_DEVICES`
on the AMD platform.
Runtime
: HIP or CUDA Runtime. Applies to HIP applications on the AMD or NVIDIA platform
and CUDA applications.
### `OMP_DEFAULT_DEVICE`
Default device used for OpenMP target offloading.
Runtime
: OpenMP Runtime. Applies only to applications using OpenMP offloading.
```{code-block} shell
:caption: Example on setting the default device to the third device.
export OMP_DEFAULT_DEVICE="2"
```
## Docker
Docker uses Linux kernel namespaces to provide isolated environments for
applications. This isolation applies to most devices by default, including
GPUs. To access them in containers explicit access must be granted, please see
{ref}`docker-access-gpus-in-container` for details.
Specifically refer to {ref}`docker-restrict-gpus` on exposing just a subset
of all GPUs.
Docker isolation is more secure than environment variables, and applies
to all programs that use the `amdgpu` kernel module interfaces.
Even programs that don't use the ROCm runtime, like graphics applications
using OpenGL or Vulkan, can only access the GPUs exposed to the container.
## GPU passthrough to virtual machines
Virtual machines achieve the highest level of isolation, because even the kernel
of the virtual machine is isolated from the host. Devices physically installed
in the host system can be passed to the virtual machine using PCIe passthrough.
This allows for using the GPU with a different operating systems like a Windows
guest from a Linux host.
Setting up PCIe passthrough is specific to the hypervisor used. ROCm officially
supports [VMware ESXi](https://www.vmware.com/products/esxi-and-esx.html)
for select GPUs.
<!--
TODO: This should link to a page about virtualization that explains
pass-through and SR-IOV and how-tos for maybe `libvirt` and `VMWare`
-->

View File

@@ -0,0 +1,241 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="GPU memory">
<meta name="keywords" content="GPU memory, VRAM, video random access memory, pageable
memory, pinned memory, managed memory, AMD, ROCm">
</head>
# GPU memory
For the HIP reference documentation, see:
* {doc}`hip:doxygen/html/group___memory`
* {doc}`hip:doxygen/html/group___memory_m`
Host memory exists on the host (e.g. CPU) of the machine in random access memory (RAM).
Device memory exists on the device (e.g. GPU) of the machine in video random access memory (VRAM).
Recent architectures use graphics double data rate (GDDR) synchronous dynamic random-access memory (SDRAM)such as GDDR6, or high-bandwidth memory (HBM) such as HBM2e.
## Memory allocation
Memory can be allocated in two ways: pageable memory, and pinned memory.
The following API calls with result in these allocations:
| API | Data location | Allocation |
|--------------------|---------------|------------|
| System allocated | Host | Pageable |
| `hipMallocManaged` | Host | Managed |
| `hipHostMalloc` | Host | Pinned |
| `hipMalloc` | Device | Pinned |
:::{tip}
`hipMalloc` and `hipFree` are blocking calls, however, HIP recently added non-blocking versions `hipMallocAsync` and `hipFreeAsync` which take in a stream as an additional argument.
:::
### Pageable memory
Pageable memory is usually gotten when calling `malloc` or `new` in a C++ application.
It is unique in that it exists on "pages" (blocks of memory), which can be migrated to other memory storage.
For example, migrating memory between CPU sockets on a motherboard, or a system that runs out of space in RAM and starts dumping pages of RAM into the swap partition of your hard drive.
### Pinned memory
Pinned memory (or page-locked memory, or non-pageable memory) is host memory that is mapped into the address space of all GPUs, meaning that the pointer can be used on both host and device.
Accessing host-resident pinned memory in device kernels is generally not recommended for performance, as it can force the data to traverse the host-device interconnect (e.g. PCIe), which is much slower than the on-device bandwidth (>40x on MI200).
Pinned host memory can be allocated with one of two types of coherence support:
:::{note}
In HIP, pinned memory allocations are coherent by default (`hipHostMallocDefault`).
There are additional pinned memory flags (e.g. `hipHostMallocMapped` and `hipHostMallocPortable`).
On MI200 these options do not impact performance.
<!-- TODO: link to programming_manual#memory-allocation-flags -->
For more information, see the section *memory allocation flags* in the HIP Programming Guide: {doc}`hip:user_guide/programming_manual`.
:::
Much like how a process can be locked to a CPU core by setting affinity, a pinned memory allocator does this with the memory storage system.
On multi-socket systems it is important to ensure that pinned memory is located on the same socket as the owning process, or else each cache line will be moved through the CPU-CPU interconnect, thereby increasing latency and potentially decreasing bandwidth.
In practice, pinned memory is used to improve transfer times between host and device.
For transfer operations, such as `hipMemcpy` or `hipMemcpyAsync`, using pinned memory instead of pageable memory on host can lead to a ~3x improvement in bandwidth.
:::{tip}
If the application needs to move data back and forth between device and host (separate allocations), use pinned memory on the host side.
:::
### Managed memory
Managed memory refers to universally addressable, or unified memory available on the MI200 series of GPUs.
Much like pinned memory, managed memory shares a pointer between host and device and (by default) supports fine-grained coherence, however, managed memory can also automatically migrate pages between host and device.
The allocation will be managed by AMD GPU driver using the Linux HMM (Heterogeneous Memory Management) mechanism.
If heterogenous memory management (HMM) is not available, then `hipMallocManaged` will default back to using system memory and will act like pinned host memory.
Other managed memory API calls will have undefined behavior.
It is therefore recommended to check for managed memory capability with: `hipDeviceGetAttribute` and `hipDeviceAttributeManagedMemory`.
HIP supports additional calls that work with page migration:
* `hipMemAdvise`
* `hipMemPrefetchAsync`
:::{tip}
If the application needs to use data on both host and device regularly, does not want to deal with separate allocations, and is not worried about maxing out the VRAM on MI200 GPUs (64 GB per GCD), use managed memory.
:::
:::{tip}
If managed memory performance is poor, check to see if managed memory is supported on your system and if page migration (XNACK) is enabled.
:::
## Access behavior
Memory allocations for GPUs behave as follow:
| API | Data location | Host access | Device access |
|--------------------|---------------|--------------|----------------------|
| System allocated | Host | Local access | Unhandled page fault |
| `hipMallocManaged` | Host | Local access | Zero-copy |
| `hipHostMalloc` | Host | Local access | Zero-copy* |
| `hipMalloc` | Device | Zero-copy | Local access |
Zero-copy accesses happen over the Infinity Fabric interconnect or PCI-E lanes on discrete GPUs.
:::{note}
While `hipHostMalloc` allocated memory is accessible by a device, the host pointer must be converted to a device pointer with `hipHostGetDevicePointer`.
Memory allocated through standard system allocators such as `malloc`, can be accessed a device by registering the memory via `hipHostRegister`.
The device pointer to be used in kernels can be retrieved with `hipHostGetDevicePointer`.
Registered memory is treated like `hipHostMalloc` and will have similar performance.
On devices that support and have [](#xnack) enabled, such as the MI250X, `hipHostRegister` is not required as memory accesses are handled via automatic page migration.
:::
### XNACK
Normally, host and device memory are separate and data has to be transferred manually via `hipMemcpy`.
On a subset of GPUs, such as the MI200, there is an option to automatically migrate pages of memory between host and device.
This is important for managed memory, where the locality of the data is important for performance.
Depending on the system, page migration may be disabled by default in which case managed memory will act like pinned host memory and suffer degraded performance.
*XNACK* describes the GPUs ability to retry memory accesses that failed due a page fault (which normally would lead to a memory access error), and instead retrieve the missing page.
This also affects memory allocated by the system as indicated by the following table:
| API | Data location | Host after device access | Device after host access |
|--------------------|---------------|--------------------------|--------------------------|
| System allocated | Host | Migrate page to host | Migrate page to device |
| `hipMallocManaged` | Host | Migrate page to host | Migrate page to device |
| `hipHostMalloc` | Host | Local access | Zero-copy |
| `hipMalloc` | Device | Zero-copy | Local access |
To check if page migration is available on a platform, use `rocminfo`:
```sh
$ rocminfo | grep xnack
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
```
Here, `xnack-` means that XNACK is available but is disabled by default.
Turning on XNACK by setting the environment variable `HSA_XNACK=1` and gives the expected result, `xnack+`:
```sh
$ HSA_XNACK=1 rocminfo | grep xnack
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack+
```
`hipcc`by default will generate code that runs correctly with both XNACK enabled or disabled.
Setting the `--offload-arch=`-option with `xnack+` or `xnack-` forces code to be only run with XNACK enabled or disabled respectively.
```sh
# Compiled kernels will run regardless if XNACK is enabled or is disabled.
hipcc --offload-arch=gfx90a
# Compiled kernels will only be run if XNACK is enabled with XNACK=1.
hipcc --offload-arch=gfx90a:xnack+
# Compiled kernels will only be run if XNACK is disabled with XNACK=0.
hipcc --offload-arch=gfx90a:xnack-
```
:::{tip}
If you want to make use of page migration, use managed memory. While pageable memory will migrate correctly, it is not a portable solution and can have performance issues if the accessed data isn't page aligned.
:::
### Coherence
* *Coarse-grained coherence* means that memory is only considered up to date at kernel boundaries, which can be enforced through `hipDeviceSynchronize`, `hipStreamSynchronize`, or any blocking operation that acts on the null stream (e.g. `hipMemcpy`).
For example, cacheable memory is a type of coarse-grained memory where an up-to-date copy of the data can be stored elsewhere (e.g. in an L2 cache).
* *Fine-grained coherence* means the coherence is supported while a CPU/GPU kernel is running.
This can be useful if both host and device are operating on the same dataspace using system-scope atomic operations (e.g. updating an error code or flag to a buffer).
Fine-grained memory implies that up-to-date data may be made visible to others regardless of kernel boundaries as discussed above.
| API | Flag | Coherence |
|-------------------------|------------------------------|----------------|
| `hipHostMalloc` | `hipHostMallocDefault` | Fine-grained |
| `hipHostMalloc` | `hipHostMallocNonCoherent` | Coarse-grained |
| API | Flag | Coherence |
|-------------------------|------------------------------|----------------|
| `hipExtMallocWithFlags` | `hipDeviceMallocDefault` | Coarse-grained |
| `hipExtMallocWithFlags` | `hipDeviceMallocFinegrained` | Fine-grained |
| API | `hipMemAdvise` argument | Coherence |
|-------------------------|------------------------------|----------------|
| `hipMallocManaged` | | Fine-grained |
| `hipMallocManaged` | `hipMemAdviseSetCoarseGrain` | Coarse-grained |
| `malloc` | | Fine-grained |
| `malloc` | `hipMemAdviseSetCoarseGrain` | Coarse-grained |
:::{tip}
Try to design your algorithms to avoid host-device memory coherence (e.g. system scope atomics). While it can be a useful feature in very specific cases, it is not supported on all systems, and can negatively impact performance by introducing the host-device interconnect bottleneck.
:::
The availability of fine- and coarse-grained memory pools can be checked with `rocminfo`:
```sh
$ rocminfo
...
*******
Agent 1
*******
Name: AMD EPYC 7742 64-Core Processor
...
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
...
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
...
*******
Agent 9
*******
Name: gfx90a
...
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
...
```
## System direct memory access
In most cases, the default behavior for HIP in transferring data from a pinned host allocation to device will run at the limit of the interconnect.
However, there are certain cases where the interconnect is not the bottleneck.
The primary way to transfer data onto and off of a GPU, such as the MI200, is to use the onboard System Direct Memory Access engine, which is used to feed blocks of memory to the off-device interconnect (either GPU-CPU or GPU-GPU).
Each GCD has a separate SDMA engine for host-to-device and device-to-host memory transfers.
Importantly, SDMA engines are separate from the computing infrastructure, meaning that memory transfers to and from a device will not impact kernel compute performance, though they do impact memory bandwidth to a limited extent.
The SDMA engines are mainly tuned for PCIe-4.0 x16, which means they are designed to operate at bandwidths up to 32 GB/s.
:::{note}
An important feature of the MI250X platform is the Infinity Fabric™ interconnect between host and device.
The Infinity Fabric interconnect supports improved performance over standard PCIe-4.0 (usually ~50% more bandwidth); however, since the SDMA engine does not run at this speed, it will not max out the bandwidth of the faster interconnect.
:::
The bandwidth limitation can be countered by bypassing the SDMA engine and replacing it with a type of copy kernel known as a "blit" kernel.
Blit kernels will use the compute units on the GPU, thereby consuming compute resources, which may not always be beneficial.
The easiest way to enable blit kernels is to set an environment variable `HSA_ENABLE_SDMA=0`, which will disable the SDMA engine.
On systems where the GPU uses a PCIe interconnect instead of an Infinity Fabric interconnect, blit kernels will not impact bandwidth, but will still consume compute resources.
The use of SDMA vs blit kernels also applies to MPI data transfers and GPU-GPU transfers.

View File

@@ -0,0 +1,250 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Using the LLVM ASan on a GPU">
<meta name="keywords" content="LLVM, ASan, address sanitizer, AddressSanitizer, instrumented
libraries, instrumented applications, AMD, ROCm">
</head>
# Using the LLVM ASan on a GPU (beta release)
The LLVM AddressSanitizer (ASan) provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications exactly like pure CPU applications. However, this simplicity has not been achieved yet.
This document provides documentation on using ROCm ASan.
For information about LLVM ASan, see the [LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
:::{note}
The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
:::
## Compiling for ASan
The ASan process begins by compiling the application of interest with the ASan instrumentation.
Recommendations for doing this are:
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
* Add the following options to the existing compiler and linker options:
* `-fsanitize=address` - enables instrumentation
* `-shared-libsan` - use shared version of runtime
* `-g` - add debug info for improved reporting
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
### About compilation time
When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.
There are a few options if the compile time becomes unacceptable:
* Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the ASan process.
* Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
* Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
## Installing ROCm GPU ASan packages
For a complete ROCm GPU Sanitizer installation, including packages, instrumented HSA and HIP runtimes, tools, and math libraries, use the following instruction,
```bash
sudo apt-get install rocm-ml-sdk-asan
```
## Using AMD-supplied ASan instrumented libraries
ROCm releases have optional packages that contain additional ASan instrumented builds of the ROCm libraries (usually found in `/opt/rocm-<version>/lib`). The instrumented libraries have identical names to the regular uninstrumented libraries, and are located in `/opt/rocm-<version>/lib/asan`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.
## Running ASan instrumented applications
### Preparing to run an instrumented application
Here are a few recommendations to consider before running an ASan instrumented heterogeneous application.
* Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
* Ensure XNACK is enabled
* For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
* For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
* Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
* Ensure that the application depends on the ASan runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
If it does not appear, when executed the application will quickly output an ASan error that looks like:
```bash
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
* Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
There is an environment variable, `ASAN_OPTIONS`, that can be used to adjust the runtime behavior of the ASAN runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASAN runtime. The device runtime only currently supports the default settings for the few relevant options.
There are two `ASAN_OPTION` flags of particular note.
* `halt_on_error=0/1 default 1`.
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
* `detect_leaks=0/1 default 1`.
This option directs the ASan runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
## Runtime overhead
Running an ASan instrumented application incurs
overheads which may result in unacceptably long runtimes
or failure to run at all.
### Higher execution time
ASan detection works by checking each address at runtime
before the address is actually accessed by a load, store, or atomic
instruction.
This checking involves an additional load to "shadow" memory which
records whether the address is "poisoned" or not, and additional logic
that decides whether to produce an detection report or not.
This extra runtime work can cause the application to slow down by
a factor of three or more, depending on how many memory accesses are
executed.
For heterogeneous applications, the shadow memory must be accessible by all devices
and this can mean that shadow accesses from some devices may be more costly
than non-shadow accesses.
### Higher memory use
The address checking described above relies on the compiler to surround
each program variable with a red zone and on ASan
runtime to surround each runtime memory allocation with a red zone and
fill the shadow corresponding to each red zone with poison.
The added memory for the red zones is additional overhead on top
of the 13% overhead for the shadow memory itself.
Applications which consume most one or more available memory pools when
run normally are likely to encounter allocation failures when run with
instrumentation.
## Runtime reporting
It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the ASan runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.
An invalid address detection report for the CPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on address <memory address> at pc <pc> bp <bp> sp <sp> <access> of size <N> at <memory address> thread T0
```
and continues with a stack trace for the access, a stack trace for the allocation and deallocation, if relevant, and a dump of the shadow near the <memory address>.
In contrast, an invalid address detection report for the GPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on amdgpu device <device> at pc <pc> <access> of size <n> in workgroup id (<X>,<Y>,<Z>)
```
Above, `<device>` is the integer device ID, and `(<X>, <Y>, <Z>)` is the ID of the workgroup or block where the invalid address was detected.
While the CPU report include a call stack for the thread attempting the invalid access, the GPU is currently to a call stack of size one, i.e. the (symbolized) of the invalid access, e.g.
```bash
#0 <pc> in <fuction signature> at /path/to/file.hip:<line>:<column>
```
This short call stack is followed by a GPU unique section that looks like
```bash
Thread ids and accessed addresses:
<lid0> <maddr 0> : <lid1> <maddr1> : ...
```
where each `<lid j> <maddr j>` indicates the lane ID and the invalid memory address held by lane `j` of the wavefront attempting the invalid access.
Additionally, reports for invalid GPU accesses to memory allocated by GPU code via `malloc` or new starting with, for example,
```bash
==1234==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fa9f5c92dcc
```
or
```bash
==5678==ERROR: AddressSanitizer: heap-use-after-free on amdgpu device 3 at pc 0x7f4c10062d74
```
currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.
### Running with `rocgdb`
`rocgdb` can be used to further investigate ASan detected errors, with some preparation.
Currently, the ASan runtime complains when starting `rocgdb` without preparation.
```bash
$ rocgdb my_app
==1122==ASan` runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
This is solved by setting environment variable `LD_PRELOAD` to the path to the ASan runtime, whose path can be obtained using the command
```bash
amdclang++ -print-file-name=libclang_rt.asan-x86_64.so
```
It is also recommended to set the environment variable `HIP_ENABLE_DEFERRED_LOADING=0` before debugging HIP applications.
After starting `rocgdb` breakpoints can be set on the ASan runtime error reporting entry points of interest. For example, if an ASan error report includes
```bash
WRITE of size 4 in workgroup id (10,0,0)
```
the `rocgdb` command needed to stop the program before the report is printed is
```bash
(gdb) break __asan_report_store4
```
Similarly, the appropriate command for a report including
```bash
READ of size <N> in workgroup ID (1,2,3)
```
is
```bash
(gdb) break __asan_report_load<N>
```
It is possible to set breakpoints on all ASan report functions using these commands:
```bash
$ rocgdb <path to application>
(gdb) start <commmand line arguments>
(gdb) rbreak ^__asan_report
(gdb) c
```
### Using ASan with a short HIP application
Refer to the following example to use ASan with a short HIP application,
https://github.com/Rmalavally/rocm-examples/blob/Rmalavally-patch-1/LLVM_ASAN/Using-Address-Sanitizer-with-a-Short-HIP-Application.md
### Known issues with using GPU sanitizer
* Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
* Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
* It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.

103
docs/conf.py Normal file
View File

@@ -0,0 +1,103 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import shutil
import jinja2
import os
# Environment to process Jinja templates.
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
# Jinja templates to render out.
templates = []
# Render templates and output files without the last extension.
# For example: 'install.md.jinja' becomes 'install.md'.
for template in templates:
rendered = jinja_env.get_template(template).render()
with open(os.path.splitext(template)[0], 'w') as file:
file.write(rendered)
shutil.copy2('../CONTRIBUTING.md','./contribute/index.md')
shutil.copy2('../RELEASE.md','./about/release-notes.md')
# Keep capitalization due to similar linking on GitHub's markdown preview.
shutil.copy2('../CHANGELOG.md','./about/CHANGELOG.md')
latex_engine = "xelatex"
latex_elements = {
"fontpkg": r"""
\usepackage{tgtermes}
\usepackage{tgheros}
\renewcommand\ttdefault{txtt}
"""
}
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.0.1"
release = "6.0.1"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{
"file":"release",
"os":["linux", "windows"],
"date":"2024-01-09"
},
{"file":"install/windows/install-quick", "os":["windows"]},
{"file":"install/linux/install-quick", "os":["linux"]},
{"file":"install/linux/install", "os":["linux"]},
{"file":"install/linux/install-options", "os":["linux"]},
{"file":"install/linux/prerequisites", "os":["linux"]},
{"file":"install/docker", "os":["linux"]},
{"file":"install/magma-install", "os":["linux"]},
{"file":"install/pytorch-install", "os":["linux"]},
{"file":"install/tensorflow-install", "os":["linux"]},
{"file":"install/windows/install", "os":["windows"]},
{"file":"install/windows/prerequisites", "os":["windows"]},
{"file":"install/windows/cli/index", "os":["windows"]},
{"file":"install/windows/gui/index", "os":["windows"]},
{"file":"about/compatibility/docker-image-support-matrix", "os":["linux"]},
{"file":"about/compatibility/user-kernel-space-compat-matrix", "os":["linux"]},
{"file":"reference/library-index", "os":["linux"]},
{"file":"how-to/deep-learning-rocm", "os":["linux"]},
{"file":"how-to/gpu-enabled-mpi", "os":["linux"]},
{"file":"how-to/system-debugging", "os":["linux"]},
{"file":"how-to/tuning-guides", "os":["linux", "windows"]},
{"file":"rocm-a-z", "os":["linux", "windows"]},
{"file":"about/release-notes", "os":["linux"]},
]
exclude_patterns = ['temp']
external_toc_path = "./sphinx/_toc.yml"
extensions = ["rocm_docs"]
external_projects_current_project = "rocm"
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}
html_title = "ROCm Documentation"
html_theme_options = {
"link_main_doc": False
}

155
docs/contribute/building.md Normal file
View File

@@ -0,0 +1,155 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Building ROCm documentation">
<meta name="keywords" content="documentation, Visual Studio Code, GitHub, command line,
AMD, ROCm">
</head>
# Building documentation
You can build our documentation via GitHub (in a pull request) or locally (using the command line or
Visual Studio (VS) Code.
## GitHub
If you open a pull request on the `develop` branch of a ROCm repository and scroll to the bottom of
the page, there is a summary panel. Next to the line
`docs/readthedocs.com:advanced-micro-devices-demo`, there is a `Details` link. If you click this, it takes
you to the Read the Docs build for your pull request.
![Screenshot of the GitHub documentation build link](../data/contribute/github-docs-build.png)
If you don't see this line, click `Show all checks` to get an itemized view.
## Command line
You can build our documentation via the command line using Python. We use Python 3.8; other
versions may not support the build.
Use the Python Virtual Environment (`venv`) and run the following commands from the project root:
```sh
python3 -mvenv .venv
# Windows
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
# Linux
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
```
Navigate to `_build/html/index.html` and open this file in a web browser.
## Visual Studio Code
With the help of a few extensions, you can create a productive environment to author and test
documentation locally using Visual Studio (VS) Code. Follow these steps to configure VS Code:
1. Install the required extensions:
* Python: `(ms-python.python)`
* Live Server: `(ritwickdey.LiveServer)`
2. Add the following entries to `.vscode/settings.json`.
```json
{
"liveServer.settings.root": "/.vscode/build/html",
"liveServer.settings.wait": 1000,
"python.terminal.activateEnvInCurrentTerminal": true
}
```
* `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
alongside the `tasks.json` command.
* `liveServer.settings.wait`: Tells the live server to wait with the update in order to give Sphinx time to
regenerate the site contents and not refresh before the build is complete.
* `python.terminal.activateEnvInCurrentTerminal`: Activates the automatic virtual environment, so you
can build the site from the integrated terminal.
3. Add the following tasks to `.vscode/tasks.json`.
```json
{
"version": "2.0.0",
"tasks": [
{
"label": "Build Docs",
"type": "process",
"windows": {
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
},
"command": "${workspaceFolder}/.venv/bin/python3",
"args": [
"-m",
"sphinx",
"-j",
"auto",
"-T",
"-b",
"html",
"-d",
"${workspaceFolder}/.vscode/build/doctrees",
"-D",
"language=en",
"${workspaceFolder}/docs",
"${workspaceFolder}/.vscode/build/html"
],
"problemMatcher": [
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"line": 2,
"severity": 3,
"message": 4
}
},
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"severity": 2,
"message": 3
}
}
],
"group": {
"kind": "build",
"isDefault": true
}
}
]
}
```
> (Implementation detail: two problem matchers were needed to be defined,
> because VS Code doesn't tolerate some problem information being potentially
> absent. While a single regex could match all types of errors, if a capture
> group remains empty (the line number doesn't show up in all warning/error
> messages) but the `pattern` references said empty capture group, VS Code
> discards the message completely.)
4. Configure the Python virtual environment (`venv`).
From the Command Palette, run `Python: Create Environment`. Select `venv` environment and
`docs/sphinx/requirements.txt`.
5. Build the docs.
Launch the default build task using one of the following options:
* A hotkey (the default is `Ctrl+Shift+B`)
* Issuing the `Tasks: Run Build Task` from the Command Palette
6. Open the live preview.
Navigate to the site output within VS Code: right-click on `.vscode/build/html/index.html` and
select `Open with Live Server`. The contents should update on every rebuild without having to
refresh the browser.

View File

@@ -0,0 +1,229 @@
# Contributing to ROCm documentation
AMD values and encourages contributions to our code and documentation. If you choose to
contribute, we encourage you to be polite and respectful. Improving documentation is a long-term
process, to which we are dedicated.
If you have issues when trying to contribute, refer to the
[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) page in our GitHub
repository.
## Folder structure and naming convention
Our documentation follows the Pitchfork folder structure. Most documentation files are stored in the
`/docs` folder. Some special files (such as release, contributing, and changelog) are stored in the root
(`/`) folder.
All images are stored in the `/docs/data` folder. An image's file path mirrors that of the documentation
file where it is used.
Our naming structure uses kebab case; for example, `my-file-name.rst`.
## Supported formats and syntax
Our documentation includes both Markdown and RST files. We are gradually transitioning existing
Markdown to RST in order to more effectively meet our documentation needs. When contributing,
RST is preferred; if you must use Markdown, use GitHub-flavored Markdown.
We use [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) syntax and compile
our API references using [Doxygen](https://www.doxygen.nl/).
The following table shows some common documentation components and the syntax convention we
use for each:
<table>
<tr>
<th>Component</th>
<th>RST syntax</th>
</tr>
<tr>
<td>Code blocks</td>
<td>
```rst
.. code-block:: language-name
My code block.
```
</td>
</tr>
<tr>
<td>Cross-referencing internal files</td>
<td>
```rst
:doc:`Title <../path/to/file/filename>`
```
</td>
</tr>
<tr>
<td>External links</td>
<td>
```rst
`link name <URL>`_
```
</td>
</tr>
<tr>
<tr>
<td>Headings</td>
<td>
```rst
******************
Chapter title (H1)
******************
Section title (H2)
===============
Subsection title (H3)
---------------------
Sub-subsection title (H4)
^^^^^^^^^^^^^^^^^^^^
```
</td>
</tr>
<tr>
<td>Images</td>
<td>
```rst
.. image:: image1.png
```
</td>
</tr>
<tr>
<td>Internal links</td>
<td>
```rst
1. Add a tag to the section you want to reference:
.. _my-section-tag: section-1
Section 1
==========
2. Link to your tag:
As shown in :ref:`section-1`.
```
</td>
</tr>
<tr>
<tr>
<td>Lists</td>
<td>
```rst
# Ordered (numbered) list item
* Unordered (bulleted) list item
```
</td>
</tr>
<tr>
<tr>
<td>Math (block)</td>
<td>
```rst
.. math::
A = \begin{pmatrix}
0.0 & 1.0 & 1.0 & 3.0 \\
4.0 & 5.0 & 6.0 & 7.0 \\
\end{pmatrix}
```
</td>
</tr>
<tr>
<td>Math (inline)</td>
<td>
```rst
:math:`2 \times 2 `
```
</td>
</tr>
<tr>
<td>Notes</td>
<td>
```rst
.. note::
My note here.
```
</td>
</tr>
<tr>
<td>Tables</td>
<td>
```rst
.. csv-table:: Optional title here
:widths: 30, 70 #optional column widths
:header: "entry1 header", "entry2 header"
"entry1", "entry2"
```
</td>
</tr>
</table>
## Language and style
We use the
[Google developer documentation style guide](https://developers.google.com/style/highlights) to
guide our content.
Font size and type, page layout, white space control, and other formatting
details are controlled via
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). If you want to notify us
of any formatting issues, create a pull request in our
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) GitHub repository.
## Building our documentation
<!-- % TODO: Fix the link to be able to work at every files -->
To learn how to build our documentation, refer to
[Building documentation](./building.md).

View File

@@ -0,0 +1,33 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Providing feedback for ROCm documentation">
<meta name="keywords" content="documentation, pull request, GitHub, AMD, ROCm">
</head>
# Providing feedback for ROCm documentation
There are four standard ways to provide feedback for this repository.
## Pull request
All contributions to ROCm documentation should arrive via the
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
targeting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us at [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
## GitHub discussions
To ask questions or view answers to frequently asked questions, refer to
[GitHub Discussions](https://github.com/RadeonOpenCompute/ROCm/discussions).
On GitHub Discussions, in addition to asking and answering questions,
members can share updates, have open-ended conversations,
and follow along on via public announcements.
## GitHub issue
Issues on existing or absent docs can be filed as
[GitHub Issues](https://github.com/RadeonOpenCompute/ROCm/issues).
## Email
Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).

View File

@@ -0,0 +1,77 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm documentation toolchain">
<meta name="keywords" content="documentation, toolchain, Sphinx, Doxygen, MyST, AMD, ROCm">
</head>
# ROCm documentation toolchain
Our documentation relies on several open source toolchains and sites.
## `rocm-docs-core`
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD-maintained
project that applies customization for our documentation. This
project is the tool most ROCm repositories use as part of the documentation
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
## Sphinx
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
originally used for Python. It is now widely used in the open source community.
Originally, Sphinx supported reStructuredText (RST) based documentation, but
Markdown support is now available.
ROCm documentation plans to default to Markdown for new projects.
Existing projects using RST are under no obligation to convert to Markdown. New
projects that believe Markdown is not suitable should contact the documentation
team prior to selecting RST.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds
and hosts the HTML documentation generated using Sphinx to our end users.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts
information from inline code.
ROCm projects typically use Doxygen for public API documentation unless the
upstream project uses a different tool.
### Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen
content.
### MyST
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
It is integrated into ROCm documentation by the Sphinx extension [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A cheat sheet that showcases how to use the MyST syntax is available over at
the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
### Sphinx External ToC
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file that specifies the table of contents.
It was selected due to its flexibility that allows scripts to operate on the
YAML file. Please transition to this file for the project's navigation. You can
see the `_toc.yml.in` file in this repository in the `docs/sphinx` folder for an
example.
### Sphinx-book-theme
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme
that defines the base appearance for ROCm documentation.
ROCm documentation applies some customization,
such as a custom header and footer on top of the Sphinx Book Theme.
### Sphinx design
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design
functionality.
ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.

BIN
docs/data/amd-logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 939 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 537 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 292 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 796 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 310 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 789 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 801 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 309 KiB

Some files were not shown because too many files have changed in this diff Show More