Compare commits

...

177 Commits

Author SHA1 Message Date
Bence Parajdi
0a0ffa933c add new page to index.md 2024-04-30 12:45:13 +02:00
Bence Parajdi
85a6a18930 fix review comments 2024-04-24 15:58:49 +02:00
Bence Parajdi
2d002ff907 fix typos 2024-04-24 13:11:21 +02:00
Bence Parajdi
13a91044f6 add cu setting page 2024-04-24 11:05:01 +02:00
Sam Wu
84c12ac1ce Update changelog.jinja to exclude version number in header for lindividual libraries (#3058) 2024-04-23 16:08:38 -06:00
peter
42849e92a6 Improve readability of GPU architecture hardware specs (#3009)
* move units of measurement to table headers

* add glossary explaining table headers

* add missed units and update h1

* toc listing to say indicate Accelerators & GPUs

* fix typo

* update meta description and keywords

* Update title in toc to fit in sidebar

* update title, toc, and filename

* Fix broken link to HIP programming guide

* Revert "update title, toc, and filename"

This reverts commit 6b9e687805.

* Revert glossary; slight fixes

* Change 'Pro' to 'PRO' for consistency

* Add references to programming and hardware architecture guides

* Change 'warp' to 'wavefront'
2024-04-22 13:23:21 -04:00
Sam Wu
a5c7a9d01f Merge branch 'idevelop' into develop 2024-04-19 16:04:57 -06:00
Sam Wu
dc3124c1dd Add note for tag_script included_names 2024-04-19 16:04:38 -06:00
Sam Wu
7f9e31d6d9 Regenerate changelog with new 5.7.1 link fix 2024-04-19 16:04:38 -06:00
Roopa Malavally
8f3a3b88aa Update 5.7.1.md
Fixing the broken link for rocBLAS programmer's guide in 5.7.1 Changelog.
2024-04-19 16:04:38 -06:00
Sam Wu
cf76b40b79 Add extra line to topp of template for formatting changelog 2024-04-19 16:04:38 -06:00
Sam Wu
41dd38168a Add HIPIFY to template for 6.1.0 2024-04-19 16:04:38 -06:00
Sam Wu
24168eb2c7 Remove RVS template
Determine cause of missing later
2024-04-19 16:04:38 -06:00
Sam Wu
628cd37aa4 Remove rocprofiler from tag_script 2024-04-19 16:04:38 -06:00
Sam Wu
76185653cd Add HIPIFY to included names for tag script 2024-04-19 16:04:38 -06:00
Sam Wu
80bb3d6c6b Add ck template 2024-04-19 16:04:38 -06:00
Sam Wu
35937f7682 Test autotag script 2024-04-19 16:04:38 -06:00
Roopa Malavally
26afbaa469 Update 6.1.0.md
Added links to GitHub for known issues and ROCm Compiler fixed defect
2024-04-19 16:04:38 -06:00
Sam Wu
e73293381b docs(6.1.0.md): Add more changelog notes for 6.1.0 2024-04-19 16:04:38 -06:00
Sam Wu
90753fa29f Add MI300 RAS fixed defect to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
f7f09f0013 Add MI200 SR-IOV known issue to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
2d4a3037ef Add ROCProfiler to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
85bc06f697 Add ROCm SMI to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
0791f2cbec Add ROCgdb to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
6865f279b4 Add RDC to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
aa47a075b8 Add ROCm Compiler to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
f7a1915e45 Add AMD SMI to 6.1.0 template 2024-04-19 16:04:38 -06:00
Sam Wu
2dd253a54c Add 6.1.0.md template 2024-04-19 16:04:38 -06:00
dependabot[bot]
3ac6f3b2cc Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx (#3047)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Set Ubuntu 22.04 and Python 3.10 in ReadtheDocs config

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-04-18 16:53:34 -06:00
Sam Wu
5aa5106b99 Update default.xml (#3038)
* Remove HIPCC from default.xml

HIPCC moved into llvm-project

* Remove ROCm-Device-Libs from default.xml

ROCm-Device-Libs was moved into llvm-project

* Remove ROCm-CompilerSupport from default.xml

ROCm-CompilerSupport was moved into llvm-project

* Add rocprofiler-register to default.xml

Added in 6.1 manifest

* Apply mathlibs group to projects in manifest
2024-04-18 16:42:33 -06:00
amitkumar-amd
b3867a44bc update the code-owner list (#3046) 2024-04-18 16:09:57 -05:00
Sam Wu
ccce331ad4 Sync develop branch 2024-04-18 15:01:30 -06:00
Sam Wu
209038da06 Add custom processor for rvs 2024-04-18 14:52:37 -06:00
Sam Wu
e9314a418c Add rpp custom processor 2024-04-18 14:52:37 -06:00
Sam Wu
ffc6c3349f Add hipfort changelog processor 2024-04-18 14:52:37 -06:00
Sam Wu
e977b783da Separate templates into a module; Fix MIVisionX template 2024-04-18 14:52:37 -06:00
Sam Wu
8873de5363 Update excluded and included projects 2024-04-18 14:52:37 -06:00
peter
e4055682fe Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative

* Fix rocPRIM data type support links

* Empty commit to trigger demo rebuild.
2024-04-18 15:10:41 -04:00
peter
e84f95f96c Update links to rocAL component (#3033)
* Update links to rocAL component

* Change absolute rocm docs links to relative
2024-04-18 15:08:20 -04:00
peter
2db2dac10d Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md

* Reorder third-party libraries installation guides in index
2024-04-18 15:02:56 -04:00
Young Hui - AMD
24cbb957d3 Update README.md (#3043)
* Update README.md

Fix rocSPARSE build link

* Update link to just general page, instead of anchor
2024-04-18 13:55:49 -04:00
Sam Wu
9dbb5d578a Use Ubuntu 22.04 and Python 3.10 in RTD config 2024-04-18 10:44:30 -06:00
dependabot[bot]
5fde4c2ff7 Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 10:44:30 -06:00
Sam Wu
e915b6a741 docs(tools/autotag/README.md): Add additional note to avoid duplicating data in changelog template (#3018) 2024-04-17 16:34:26 -06:00
Roopa Malavally
9241c40166 Update CHANGELOG.md
Updated ROCm Compiler with fixed issue
2024-04-17 13:16:08 -07:00
Sam Wu
10d29ca45b Update manifest for ROCm 6.1.0 (#3022)
* Reorganize default.xml by group and alphabetically

* Add rocDecode to default.xml

* Add rocDecode to included names in tag script

* update tag to 6.1.0

---------

Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
2024-04-17 12:52:19 -06:00
Roopa Malavally
678ccdddb9 Update CHANGELOG.md 2024-04-17 08:37:07 -07:00
Roopa Malavally
af1623a146 Update CHANGELOG.md
Added GitHub links to Changelog
2024-04-17 08:09:28 -07:00
Sam Wu
0c14b861d2 Add ROCm version 6.1.0 to version list (#3023) 2024-04-16 16:21:59 -06:00
Sam Wu
de6b23da83 Sync develop branches 2024-04-16 15:56:14 -06:00
Sam Wu
04a314180f Add rocDecode version 2024-04-16 15:55:29 -06:00
Roopa Malavally
46e34bef8d Update CHANGELOG.md 2024-04-16 15:55:29 -06:00
Sam Wu
6d7daee9af Remove duplicate entry for Tensile 2024-04-16 15:55:29 -06:00
Lisa Delaney
2ea7ac694e Manually update release notes and changelog
Added known issue for ROCm compiler

https://ontrack-internal.amd.com/browse/SWDEV-454778

Added known issue for RVS

Added known issue for MI200 SRIOV

Updated PEBB test known issue for RVS

Added expansion for PEBB

Added PBQT known issue

expanded P2P Benchmark and Qualification Tool

Edited RVS known issue description based on Leo's input

Added MI300A fixed defect

Removed PEBB and Babel Stream from RVS known issue

Updated RCCL

Added rocm-cmake

Added rocRAND

Added rocWMMA

Added Tensile

Alan's change 1

Alan change to HIPIFY

Alan's edit 3 for MIOpen

OpenMP 2nd bullet fix - Alan edit

Alan's edit - ROCm Compiler

ROCm Validation Suite edits

Alan's edit rocSOLVER

Alan's edit to ROCTracer

Updated hipSPARSELt

Added hipTensor 1.2.0

Added hipTensor

data type correction

updated the RCCL version

Added bullets to known issues for consistency

Changed RAS to Fixed defect
2024-04-16 15:55:29 -06:00
peter
3ffd2f78e9 Add rocDecode to API libraries (#3019) 2024-04-16 16:08:03 -04:00
peter
4b1574cbe2 Add rocDecode to What is ROCm? components list (#3016)
* Add rocDecode to What is ROCm? components list

* Fix typo -> 'Common Language Runtime'

* Change 'compute' to 'common'
2024-04-16 15:48:17 -04:00
Sam Wu
df6dcac677 Add best practice for updating changelog (#3013) 2024-04-15 14:11:59 -06:00
Lisa
a29b54a453 Update links (#2992)
* Update links

* table cleanup

* cross-refs

* wordlist update

* add temp hard links

* verbiage

* docs(index.md): Disable MD051 for Sphinx Markdown anchor point

In general this rule should be followed to avoid broken links

* revert gpu-arch table, remove dropdowns, quick start hyphen removedon index.md

* revise opening text as per PR comment

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
2024-04-12 15:36:23 -04:00
Sam Wu
5ea5d1d3f1 Update codeowners (#3008) 2024-04-12 14:54:09 -04:00
peter
da18980f63 Reorganize "What is ROCm?" page (#3006)
* add rocm software stack diagram to What is ROCm landing page

* restructure ROCm project list table

* clean up unnecessary hyphenation

* update What is ROCm stack diagram filename

* reorder rocm project list to reflect diagram

* update "What is ROCm?" image metadata

* change 'project list' to 'components'

* change 'project' to 'component'
2024-04-12 14:01:41 -04:00
dependabot[bot]
a6cffe5963 Bump idna from 3.4 to 3.7 in /docs/sphinx (#3007)
Bumps [idna](https://github.com/kjd/idna) from 3.4 to 3.7.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.4...v3.7)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-12 09:28:13 -06:00
dependabot[bot]
18c4cb3ab5 Bump rocm-docs-core from 0.38.0 to 0.38.1 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-11 09:37:08 -06:00
dependabot[bot]
01c91ac2ff Bump rocm-docs-core from 0.38.0 to 0.38.1 in /docs/sphinx (#3004)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-11 09:24:20 -06:00
Roopa Malavally
13ad427c8e Update using-gpu-sanitizer.md (#2991)
* Update using-gpu-sanitizer.md

Minor OpenMP update

* Update using-gpu-sanitizer.md

Updated note with additional information.

* Update using-gpu-sanitizer.md

* Update using-gpu-sanitizer.md

Moved the note to another section

* Update using-gpu-sanitizer.md
2024-04-09 11:10:55 -07:00
Edgar Gabriel
00907151a2 minor update to the gpu-mpi section (#2983)
provide the precise parameters required to run Open MPI with libfabric and rocm
support.
2024-04-04 17:44:17 -04:00
dependabot[bot]
75da6927fc Bump rocm-docs-core from 0.37.0 to 0.38.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-27 11:21:21 -06:00
dependabot[bot]
5bb25f62ed Bump rocm-docs-core from 0.37.0 to 0.38.0 in /docs/sphinx (#2986)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-26 16:35:30 -06:00
Lisa
645e7a26aa Update what-is-rocm.rst (#2984) 2024-03-26 16:04:00 -06:00
Sam Wu
2cc67e9a4c Sync ROCm 2024-03-26 10:35:27 -06:00
Sam Wu
6ee6dd32f5 Add check for empty string in prev lib ver; also fix typo in ROCm 2024-03-22 16:50:57 -06:00
Sam Wu
40c69baf30 Update autotag README 2024-03-22 16:50:57 -06:00
Roopa Malavally
f298d60976 Update using-gpu-sanitizer.md (#2970)
* Update using-gpu-sanitizer.md

added the link text

Added the example

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-03-22 15:33:50 -06:00
Sam Wu
1425bd269c Merge pull request #2968 from ROCm/dependabot/pip/docs/sphinx/rocm-docs-core-0.37.0
Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx
2024-03-21 09:17:09 -06:00
dependabot[bot]
22121a9511 Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-20 22:44:50 +00:00
Sam Wu
f39af205f0 Merge pull request #2966 from StreamHPC/redirect
redirect openmp links to cover the move
2024-03-20 14:37:22 -06:00
Bence Parajdi
e7865ebe89 add reredirect extension
add redirection for openmp documentation
2024-03-20 14:47:35 +01:00
MKKnorr
cac5df504c Add Radeon and Raden Pro specifications to the architecture reference (#2960)
* Expand architecture hardware specifications overview

Add supported Radeon and Radeon Pro GPUs

* Remove glossary from gpu architecture hardware specifications
2024-03-18 13:34:20 -04:00
Sam Wu
e6b4715b4f Merge pull request #2962 from samjwu/rocal-link
Update rocAL link
2024-03-13 14:14:52 -06:00
Sam Wu
a9e4678d8b Update rocAL link 2024-03-13 11:03:00 -06:00
Sam Wu
75baa9fd18 Merge pull request #2959 from ROCm/dependabot/pip/docs/sphinx/rocm-docs-core-0.36.0
Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx
2024-03-12 09:17:41 -06:00
dependabot[bot]
c84e22937f Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-11 22:53:06 +00:00
randyh62
47192a92ba added Getting ROCm Source Files (#2952)
* added Accessing ROCm Source Files

* changed per comments

* Update README.md

implement dgaliffi suggestions

Co-authored-by: David Galiffi <dgaliffi@amd.com>

* Update README.md

implement dgailifi suggestion

Co-authored-by: David Galiffi <dgaliffi@amd.com>

* Update README.md

implement dgailifi suggestion

Co-authored-by: David Galiffi <dgaliffi@amd.com>

* Update README.md

implement dgailifi suggestion

Co-authored-by: David Galiffi <dgaliffi@amd.com>

* add default.xml link

* update README

---------

Co-authored-by: David Galiffi <dgaliffi@amd.com>
2024-03-07 14:41:39 -08:00
Sam Wu
7b3253a42f Merge pull request #2951 from ROCm/dependabot/pip/docs/sphinx/rocm-docs-core-0.35.1
Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx
2024-03-06 16:10:42 -07:00
dependabot[bot]
5af462baed Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-06 22:11:39 +00:00
Istvan Kiss
47d06cb492 Precision support (#2815)
* Precision support page initial commit

Move to rst file

Fix details of Mi100

Update docs/about/compatibility/precission-support.md Co-authored-by: MKKnorr <MKKnorr@web.de>

* Update precission-support page

Co-authored-by: MKKnorr <MKKnorr@web.de>

* PR fix based on feedbackcs

* Rename precision-support.rst to data-type_support.rts

* Update rocThrust library data type support

* PR findings fixes

* Update data-type-support page

Co-authored-by: MKKnorr <MKKnorr@web.de>

* Update docs/about/compatibility/data-type-support.rst

Co-authored-by: MKKnorr <MKKnorr@web.de>

* lisa edits

---------

Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2024-03-05 09:48:14 -07:00
yhuiYH
2f810fa881 Merge pull request #2942 from LisaDelaney/link-updates2
fix links
2024-03-04 19:04:53 -05:00
Lisa Delaney
2e9a20ab12 fix links 2024-03-04 15:28:07 -07:00
Istvan Kiss
e4c0cf9044 Branch rebase fix (#2916) 2024-02-29 15:05:49 -07:00
MKKnorr
cd586348f5 Add instinct gpu architectures information (#2859)
* Add instinct gpu architectures information

* Improve gpu architecture table

Move table to "reference" instead of "conceptual"

* Add HIP terminology to GPU Arch glossary
2024-02-29 15:03:23 -07:00
Sam Wu
2666c51c54 Merge pull request #2919 from ROCm/dependabot/pip/docs/sphinx/cryptography-42.0.4
Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx
2024-02-23 15:38:14 -07:00
Sam Wu
ed4eb3f292 Merge pull request #2924 from ROCm/dependabot/pip/docs/sphinx/rocm-docs-core-0.35.0
Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx
2024-02-23 15:36:18 -07:00
dependabot[bot]
008962298a Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-23 22:09:15 +00:00
Lisa
371b06f2c0 What is updates (#2923) 2024-02-23 12:13:26 -07:00
Sam Wu
89110a1662 Merge pull request #2910 from LisaDelaney/image-cleanup
add alt text
2024-02-22 17:26:17 -07:00
Lisa
0f7b28c048 license update (#2922) 2024-02-22 17:04:44 -07:00
randyh62
e5d86c4433 Update contributing.md (#2921)
reordered doc build commands
2024-02-22 09:44:04 -08:00
Lisa
d64d5e6750 Update build commands (#2917)
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-02-21 13:53:03 -07:00
dependabot[bot]
8b386cd841 Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-21 20:52:55 +00:00
parbenc
0d3e0e65dc Merge pull request #2909 from StreamHPC/repo_tool_links
add links to repo tool and manifest format to README.md
2024-02-21 11:50:11 +01:00
Istvan Kiss
67e3fc994b MI300 documentation (#2779)
---------

Co-authored-by: Nagy-Egri Máté Ferenc <mate@streamhpc.com>
Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
Co-authored-by: Davide Teixeira <77169625+daviteix@users.noreply.github.com>
2024-02-20 17:02:36 -07:00
Sam Wu
e30ec80742 Merge pull request #2900 from samjwu/versionlist
Add 6.0.2 to version list
2024-02-20 15:04:08 -07:00
Sam Wu
51aa39d0f5 Merge pull request #2902 from ROCm/dependabot/pip/docs/sphinx/cryptography-42.0.2
Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx
2024-02-20 15:03:16 -07:00
Lisa Delaney
70ea915709 reduce image sizes 2024-02-20 10:34:04 -07:00
Lisa Delaney
40014ae3ad add alt text 2024-02-20 10:15:23 -07:00
Bence Parajdi
18d4f30d8c add links to repo tool and manifest format to README.md 2024-02-20 16:08:15 +01:00
conboy
2130375f27 Update link to contributing docs 2024-02-20 09:38:12 -05:00
dependabot[bot]
9a4d9c2828 Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-17 00:55:58 +00:00
Sam Wu
a7f389bfde Add 6.0.2 to version list 2024-02-16 16:43:14 -07:00
Sam Wu
258ef77e7c Merge pull request #2899 from ROCm/dependabot/pip/docs/sphinx/rocm-docs-core-0.34.2
Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx
2024-02-16 11:13:27 -07:00
dependabot[bot]
5400d1b231 Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-15 22:46:25 +00:00
Sam Wu
acca9920d3 Merge pull request #2894 from LisaDelaney/remove-links
remove broken links
2024-02-12 16:02:20 -07:00
Lisa
8dc8a0eb62 Update docs/conceptual/cmake-packages.rst
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-02-09 15:58:38 -07:00
David Galiffi
752d31133a Remove Unused Remotes from Manifest
Remove the definition of unused remotes from the manifest file. 
Everything ROCm is now in the ROCm organization.
2024-02-09 15:51:36 -05:00
Lisa Delaney
35acb3da5e update github links 2024-02-09 13:42:59 -07:00
Lisa Delaney
122705c1f4 remove broken links 2024-02-09 13:32:06 -07:00
Sam Wu
da427861e0 Merge pull request #2853 from ROCm/changelog_minor_fix
Warning and changelog link fixes
2024-02-09 11:43:48 -07:00
Istvan Kiss
02cc970a75 Update github links to ROCm organization 2024-02-09 17:03:40 +01:00
Istvan Kiss
3ecf0e70b1 PR findings and more broken links 2024-02-09 15:18:07 +01:00
Istvan Kiss
bc6da86022 Update tools/autotag/templates/rocm_changes/5.5.0.md
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-02-09 15:18:07 +01:00
Istvan Kiss
c644023cce Update CHANGELOG.md
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-02-09 15:18:07 +01:00
Istvan Kiss
4299baf978 hipDeviceMalloc.cpp has been removed 2024-02-09 15:18:07 +01:00
Istvan Kiss
c55dd93186 Fix more broken links 2024-02-09 15:18:07 +01:00
Istvan Kiss
378ef08233 Minor warning fixes 2024-02-09 15:18:07 +01:00
Istvan Kiss
8563784791 Changelog incorrect path remove 2024-02-09 15:17:21 +01:00
Lisa
a44f6d1efc link updates (#2861) 2024-02-08 17:24:12 -07:00
Sam Wu
82ac21fac5 docs(conf.py): Update article info date for release notes (#2862)
* docs(conf.py): Update article info date for release notes

* docs(conf.py): Update article info date for changelog
2024-02-08 16:56:56 -07:00
Lisa
801457ce6a update tm (#2887) 2024-02-08 16:15:33 -07:00
dependabot[bot]
3d3a5269a5 Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#2891)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-08 16:05:53 -07:00
Lisa
3c94962813 new banner images (#2884) 2024-02-08 11:53:48 -07:00
Lisa
8bbd51376d update contributing section & update card images (#2865) 2024-02-07 09:31:45 -07:00
dependabot[bot]
4825afa951 Bump rocm-docs-core from 0.33.0 to 0.33.2 in /docs/sphinx (#2873)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-06 15:53:05 -07:00
dependabot[bot]
15192db6ef Bump cryptography from 41.0.3 to 42.0.0 in /docs/sphinx (#2871)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.3 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.3...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-06 15:49:48 -07:00
Sam Wu
83766203ff Update changelog announcement (#2857)
* Update changelog announcement

* Update phrasing
2024-01-31 16:04:14 -07:00
Sam Wu
336f88c7c2 Fix typo in changelog (#2856) 2024-01-31 15:03:31 -07:00
zhang2amd
78bd182403 Update default.xml to version 6.0.2 (#2855) 2024-01-31 14:33:45 -07:00
Lisa
ba9cc4f185 changelog updates (#2792)
* changelog updates

* updates

* changelog updates

* Update CHANGELOG.md

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

* Update RELEASE.md

* 6.0.1 -> 6.0.2

* 6.0.1 -> 6.0.2

* Update CONTRIBUTING.md (#2791)

* Update CONTRIBUTING.md

* Fixed link to licensing document

Also, changed to use relative links for internal files.

* Create issue_retrieval.yml

I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file.

* changed group to ROCM-Ubuntu

* Added a field to specify project number

This action receives an org name and project number and adds issues to it using this information

* Update issue_retrieval.yml

* Update issue_retrieval.yml

* Revert "Update CONTRIBUTING.md" (#2795)

* Text change to direct PRs into default branch, since not all repos have develop branch

* add keywords (#2799)

* Update issue_retrieval.yml

* ci(default.xml): Add hipBLASLt to manifest (#2796)

* Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803)

* Delete .github/ISSUE_TEMPLATE/issue_report.yml

* Delete .github/ISSUE_TEMPLATE/config.yml

* Delete .github/ISSUE_TEMPLATE directory (#2805)

* docs(conf.py): Update article info for release page (#2806)

* docs(conf.py): Update article info for release page

* Update conf.py

* Fix typo (#2809)

* Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2807)

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* corrections for Issue #2753 (#2819)

* docs(versions.md): Add 5.6.1 to versions list (#2816)

* Add codeowners for documentation (#2834)

Co-authored-by: samjwu <samjwu@users.noreply.github.com>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2835)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump gitpython from 3.1.30 to 3.1.41 in /docs/sphinx (#2836)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.30 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.30...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* changelog updates

* sync release file with changelog

* remove 6.0.0 duplicates

* update intro

* Update CHANGELOG.md

* Update RELEASE.md

* clean up duplicates

* caps

* minor update

* language update

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2024-01-31 13:26:27 -07:00
Lisa
df70d90d49 radeon updates (#2818)
* radeon updates

* update link

* update intro

* verbiage

* Update docs/index.md

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Update docs/what-is-rocm.md

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Use intersphinx link for radeon

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2024-01-30 13:20:28 -07:00
dependabot[bot]
95fa47e31a Bump rocm-docs-core from 0.31.0 to 0.33.0 in /docs/sphinx (#2851)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-29 17:20:35 -07:00
Spencer Hance
5afa1539ed Fix link to building.md in README (#2843)
Fix broken link to building.md in README.  It was missing `/docs/` in the path.
2024-01-29 17:04:10 -07:00
BrenAMD
0b5cfca1e4 Updated New ROCm meta package section (#2839) 2024-01-25 12:19:34 -07:00
dependabot[bot]
14979045a8 Bump gitpython from 3.1.30 to 3.1.41 in /docs/sphinx (#2836)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.30 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.30...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-23 09:44:58 -07:00
dependabot[bot]
65b5a383ec Bump jinja2 from 3.1.2 to 3.1.3 in /docs/sphinx (#2835)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-23 09:44:43 -07:00
Sam Wu
c679235a90 Add codeowners for documentation (#2834)
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2024-01-23 09:29:14 -07:00
Sam Wu
4833ecfa6a docs(versions.md): Add 5.6.1 to versions list (#2816) 2024-01-22 15:16:58 -07:00
randyh62
c9425c6d19 corrections for Issue #2753 (#2819) 2024-01-18 09:31:45 -07:00
dependabot[bot]
c4383d217a Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#2807)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-16 11:53:20 -07:00
Sam Wu
d509656c6b Fix typo (#2809) 2024-01-16 10:48:21 -07:00
Sam Wu
c2a3626026 docs(conf.py): Update article info for release page (#2806)
* docs(conf.py): Update article info for release page

* Update conf.py
2024-01-12 17:12:56 -07:00
abhimeda
51d5bf015c Delete .github/ISSUE_TEMPLATE directory (#2805) 2024-01-12 16:12:09 -07:00
abhimeda
c6facfb30f Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803)
* Delete .github/ISSUE_TEMPLATE/issue_report.yml

* Delete .github/ISSUE_TEMPLATE/config.yml
2024-01-12 15:20:15 -07:00
Sam Wu
fce96340f4 ci(default.xml): Add hipBLASLt to manifest (#2796) 2024-01-12 15:19:22 -07:00
abhimeda
8d44e04483 Merge pull request #2800 from ROCm/abhimeda-added-env-variables-to-workflow-file
Added repository secrets to ROCm and pointing the workflow file to use them
2024-01-12 11:46:26 -05:00
abhimeda
dcce85a84a Update issue_retrieval.yml 2024-01-12 10:57:29 -05:00
Lisa
d399b13c88 add keywords (#2799) 2024-01-11 14:07:30 -07:00
yhuiYH
20005e0ef7 Merge pull request #2798 from ROCm/amd/dev/yhui/UpdateTextInContributing
Update Contributing.md to direct PRs to use repo's default branch
2024-01-11 15:08:37 -05:00
Young Hui
d05c1d529e Text change to direct PRs into default branch, since not all repos have develop branch 2024-01-11 14:02:17 -05:00
Lisa
163262643f Revert "Update CONTRIBUTING.md" (#2795) 2024-01-10 11:26:47 -07:00
abhimeda
318126b155 Merge pull request #2772 from ROCm/abhimeda-adding-workflow-file-to-test-github-runner
Abhimeda adding workflow file to create GitHub Action
2024-01-10 10:16:11 -05:00
David Galiffi
2be774fb19 Update CONTRIBUTING.md (#2791)
* Update CONTRIBUTING.md

* Fixed link to licensing document

Also, changed to use relative links for internal files.
2024-01-10 07:04:38 -07:00
Sam Wu
3faa2600eb Generate release notes for 6.0.1 from autotag script (#2790) 2024-01-09 13:39:19 -07:00
Sam Wu
7ffc622039 docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices (#2775)
* docs(gpu-enabled-mpi.rst): Fix links to 3rd party support matrices

* docs: Directly link for RST instead of using intersphinx
2024-01-08 16:34:45 -07:00
Istvan Kiss
054689be6a Mi300 info update (#2780) 2024-01-08 16:30:41 -07:00
abhimeda
40b5f85af9 Update issue_retrieval.yml 2024-01-04 15:40:05 -05:00
abhimeda
a1372d56f9 Update issue_retrieval.yml 2024-01-03 14:54:10 -05:00
abhimeda
717b09f7eb Added a field to specify project number
This action receives an org name and project number and adds issues to it using this information
2024-01-03 14:50:52 -05:00
abhimeda
1cd2b651c4 changed group to ROCM-Ubuntu 2024-01-01 21:55:28 -05:00
abhimeda
587f821194 Create issue_retrieval.yml
I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file.
2024-01-01 21:53:42 -05:00
Lisa
f94a8620eb Update CHANGELOG.md (#2762) 2023-12-20 13:40:35 -07:00
Lisa
5f9842db8f link fixes & consistency (#2761) 2023-12-20 12:42:15 -07:00
dependabot[bot]
6fae95aa02 Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#2759)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 17:13:46 -07:00
Sam Wu
538a44f4d7 docs: Update GPU and OS support for Linux page (#2757) 2023-12-19 15:53:52 -07:00
Sam Wu
6c90336e67 Merge docs/6.0.0 into develop (#2756)
* Marking TransferBench as beta (#2727)

* Known issues (#2731) (#2732)

* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

* docs(library-index.md): Add MIVisionX to library index (#2736)

* Delete docs/about/compatibility/linux-support.md (#2734)

* Delete docs/about/compatibility/linux-support.md

* Update _toc.yml.in

* Update _toc.yml.in

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>

* Corrected OS version (#2738)

* Corrected OS version 

There is no 22.04.5 exist.
It's 22.04.3 which has been tested and supported

* Update CHANGELOG.md

* Update _toc.yml.in (#2750)

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
Co-authored-by: pramenku <7664080+pramenku@users.noreply.github.com>
2023-12-19 15:43:04 -07:00
abhimeda
7f4922d2b2 Abhimeda updating issue template (#2749)
* added ROCm v6, MI300, and set a default component

* Delete .github/ISSUE_TEMPLATE/0_issue_report.yml
2023-12-18 15:06:35 -07:00
Mátyás Aradi
3e1a87a4f1 Remove virtualenv build from dependencies (#2699)
* Remove virtualenv build from dependencies

* Rename ROCM_BUILD_DOCS to BUILD_DOCS
2023-12-18 07:03:55 -07:00
yhuiYH
eeb96ebb18 Move documentation contributing.md and add Governance.md and Contributing.md (#2690)
* moved contributing.md to new location as it describes contributing to documentation

* Adding Governance.md and high-level Contributing.md

* fix linting errors (asterisk, whitespace and unused links)

* More linting fixes

* merge conflicts

* verbiage

* License link moved out of codeblock, and text fix there. Changed to full name of AMD. Update links to ROCm Org path

* whitespace linting fix

* Reverted back to ROCm is lead and managed by AMD.  Flows better to me.

---------

Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>
2023-12-15 16:14:13 -07:00
Sam Wu
82d871c907 Merge Roc 6.0.x into develop (#2733)
* Marking TransferBench as beta (#2727)

* Known issues (#2731)

* rearranging

* edits

* update toc

* link update

* line break

* updates

* Update RELEASE.md

* edits

* Update conf.py

* file cleanup

* Update RELEASE.md

* Update conf.py

* addition

* verbiage

* Update CHANGELOG.md

* edits

* edits

* updates

* edits

* more edits

* Update RELEASE.md

Limited OS to start in 6.0

* Update RELEASE.md

* Update RELEASE.md

Table to reflect support.

* Update RELEASE.md

tweaked language

* Update RELEASE.md

Tweaking language

* edits

* edits

* link

* spelling

* add link

* new section

* Add files via upload (#2701)

* updates

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>

---------

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Ronnie Chatterjee <111161280+ronniec91@users.noreply.github.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
2023-12-15 15:06:03 -07:00
abhimeda
5676b16fce Add files via upload (#2701) 2023-12-15 14:42:13 -07:00
138 changed files with 7559 additions and 3121 deletions

6
.github/CODEOWNERS vendored Normal file → Executable file
View File

@@ -1 +1,5 @@
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus @LisaDelaney
* @amd-aakash @jlgreathouse @samjwu @ROCm/rocm-documentation
# Documentation files
docs/* @ROCm/rocm-documentation
*.md @ROCm/rocm-documentation
*.rst @ROCm/rocm-documentation

View File

@@ -1,76 +0,0 @@
name: Issue Report
description: File a report for something not working correctly.
title: "[Issue]: "
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to fill out this report!
On a Linux system, you can acquire your OS, CPU, GPU, and ROCm version (for filling out this report) with the following commands:
echo "OS:" && cat /etc/os-release | grep -E "^(NAME=|VERSION=)";
echo "CPU: " && cat /proc/cpuinfo | grep "model name" | sort --unique;
echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
echo "ROCm in /opt:" && ls -1 /opt | grep -E "rocm-";
- type: textarea
attributes:
label: Problem Description
description: Describe the issue you encountered.
placeholder: "The steps to reproduce can be included here, or in the dedicated section further below."
validations:
required: true
- type: input
attributes:
label: Operating System
description: What is the name and version number of the OS?
placeholder: "e.g. Ubuntu 22.04.3 LTS (Jammy Jellyfish)"
validations:
required: true
- type: input
attributes:
label: CPU
description: What CPU did you encounter the issue on?
placeholder: "e.g. AMD Ryzen 9 5900HX with Radeon Graphics"
validations:
required: true
- type: input
attributes:
label: GPU
description: What GPU(s) did you encounter the issue on?
placeholder: "e.g. MI200"
validations:
required: true
- type: input
attributes:
label: ROCm Version
description: What version(s) of ROCm did you encounter the issue on?
placeholder: "e.g. 5.7.0"
validations:
required: true
- type: input
attributes:
label: ROCm Component
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
placeholder: "e.g. rocBLAS"
- type: textarea
attributes:
label: Steps to Reproduce
description: (Optional) Detailed steps to reproduce the issue.
placeholder: Please also include what you expected to happen, and what actually did, at the failing step(s).
validations:
required: false
- type: textarea
attributes:
label: Output of /opt/rocm/bin/rocminfo --support
description: The output of rocminfo --support will help to better address the problem.
placeholder: |
ROCk module is loaded
=====================
HSA System Attributes
=====================
[...]
validations:
required: true

View File

@@ -1,32 +0,0 @@
name: Feature Suggestion
description: Suggest an additional functionality, or new way of handling an existing functionality.
title: "[Feature]: "
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to make a suggestion!
- type: textarea
attributes:
label: Suggestion Description
description: Describe your suggestion.
validations:
required: true
- type: input
attributes:
label: Operating System
description: (Optional) If this is for a specific OS, you can mention it here.
placeholder: "e.g. Ubuntu"
- type: input
attributes:
label: GPU
description: (Optional) If this is for a specific GPU or GPU family, you can mention it here.
placeholder: "e.g. MI200"
- type: input
attributes:
label: ROCm Component
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
placeholder: "e.g. rocBLAS"

View File

@@ -1,5 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: ROCm Community Discussions
url: https://github.com/RadeonOpenCompute/ROCm/discussions
about: Please ask and answer questions here for anything ROCm.

22
.github/workflows/issue_retrieval.yml vendored Normal file
View File

@@ -0,0 +1,22 @@
name: Issue retrieval
on:
issues:
types: [opened]
jobs:
auto-retrieve:
runs-on: ubuntu-latest
steps:
- name: Generate a token
id: generate_token
uses: actions/create-github-app-token@v1
with:
app_id: ${{ secrets.ACTION_APP_ID }}
private_key: ${{ secrets.ACTION_PEM }}
- name: 'Retrieve Issue'
uses: abhimeda/rocm_issue_management@main
with:
authentication-token: ${{ steps.generate_token.outputs.token }}
github-organization: 'ROCm'
project-num: '6'

View File

@@ -17,4 +17,4 @@ on:
jobs:
call-workflow-passing-data:
name: Documentation
uses: RadeonOpenCompute/rocm-docs-core/.github/workflows/linting.yml@develop
uses: ROCm/rocm-docs-core/.github/workflows/linting.yml@develop

View File

@@ -13,6 +13,5 @@ config:
MD051: false
ignores:
- CHANGELOG.md
- docs/CHANGELOG.md
- "{,docs/}{RELEASE,release}.md"
- tools/autotag/templates/**/*.md

View File

@@ -13,6 +13,6 @@ python:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-20.04
os: ubuntu-22.04
tools:
python: "3.8"
python: "3.10"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -24,7 +24,7 @@ cmake_minimum_required(VERSION 3.18.0)
project(ROCm VERSION 5.7.1 LANGUAGES NONE)
option(ROCM_BUILD_DOCS "Build ROCm documentation" ON)
option(BUILD_DOCS "Build ROCm documentation" ON)
include(GNUInstallDirs)
@@ -35,6 +35,6 @@ list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules")
include(Dependencies)
# Build docs
if(ROCM_BUILD_DOCS)
if(BUILD_DOCS)
add_subdirectory(docs)
endif()

View File

@@ -1,229 +1,94 @@
# Contributing to ROCm documentation
AMD values and encourages contributions to our code and documentation. If you choose to
contribute, we encourage you to be polite and respectful. Improving documentation is a long-term
process, to which we are dedicated.
If you have issues when trying to contribute, refer to the
[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) page in our GitHub
repository.
## Folder structure and naming convention
Our documentation follows the Pitchfork folder structure. Most documentation files are stored in the
`/docs` folder. Some special files (such as release, contributing, and changelog) are stored in the root
(`/`) folder.
All images are stored in the `/docs/data` folder. An image's file path mirrors that of the documentation
file where it is used.
Our naming structure uses kebab case; for example, `my-file-name.rst`.
## Supported formats and syntax
Our documentation includes both Markdown and RST files. We are gradually transitioning existing
Markdown to RST in order to more effectively meet our documentation needs. When contributing,
RST is preferred; if you must use Markdown, use GitHub-flavored Markdown.
We use [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) syntax and compile
our API references using [Doxygen](https://www.doxygen.nl/).
The following table shows some common documentation components and the syntax convention we
use for each:
<table>
<tr>
<th>Component</th>
<th>RST syntax</th>
</tr>
<tr>
<td>Code blocks</td>
<td>
```rst
.. code-block:: language-name
My code block.
```
</td>
</tr>
<tr>
<td>Cross-referencing internal files</td>
<td>
```rst
:doc:`Title <../path/to/file/filename>`
```
</td>
</tr>
<tr>
<td>External links</td>
<td>
```rst
`link name <URL>`_
```
</td>
</tr>
<tr>
<tr>
<td>Headings</td>
<td>
```rst
******************
Chapter title (H1)
******************
Section title (H2)
===============
Subsection title (H3)
---------------------
Sub-subsection title (H4)
^^^^^^^^^^^^^^^^^^^^
```
</td>
</tr>
<tr>
<td>Images</td>
<td>
```rst
.. image:: image1.png
```
</td>
</tr>
<tr>
<td>Internal links</td>
<td>
```rst
1. Add a tag to the section you want to reference:
.. _my-section-tag: section-1
Section 1
==========
2. Link to your tag:
As shown in :ref:`section-1`.
```
</td>
</tr>
<tr>
<tr>
<td>Lists</td>
<td>
```rst
# Ordered (numbered) list item
* Unordered (bulleted) list item
```
</td>
</tr>
<tr>
<tr>
<td>Math (block)</td>
<td>
```rst
.. math::
A = \begin{pmatrix}
0.0 & 1.0 & 1.0 & 3.0 \\
4.0 & 5.0 & 6.0 & 7.0 \\
\end{pmatrix}
```
</td>
</tr>
<tr>
<td>Math (inline)</td>
<td>
```rst
:math:`2 \times 2 `
```
</td>
</tr>
<tr>
<td>Notes</td>
<td>
```rst
.. note::
My note here.
```
</td>
</tr>
<tr>
<td>Tables</td>
<td>
```rst
.. csv-table:: Optional title here
:widths: 30, 70 #optional column widths
:header: "entry1 header", "entry2 header"
"entry1", "entry2"
```
</td>
</tr>
</table>
## Language and style
We use the
[Google developer documentation style guide](https://developers.google.com/style/highlights) to
guide our content.
Font size and type, page layout, white space control, and other formatting
details are controlled via
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). If you want to notify us
of any formatting issues, create a pull request in our
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) GitHub repository.
## Building our documentation
<!-- % TODO: Fix the link to be able to work at every files -->
To learn how to build our documentation, refer to
[Building documentation](./building.md).
<head>
<meta charset="UTF-8">
<meta name="description" content="Contributing to ROCm">
<meta name="keywords" content="ROCm, contributing, contribute, maintainer, contributor">
</head>
# Contribute to ROCm
AMD values and encourages contributions to our code and documentation. If you want to contribute
to our ROCm repositories, first review the following guidance. For documentation-specific information,
see [Contributing to ROCm docs](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
ROCm is a software stack made up of a collection of drivers, development tools, and APIs that enable
GPU programming from low-level kernel to end-user applications. Because some of our components
are inherited from external projects (such as
[LLVM](https://github.com/ROCm/llvm-project) and
[Kernel driver](https://github.com/ROCm/ROCK-Kernel-Driver)), these use
project-specific contribution guidelines and workflow. Refer to their repositories for more information.
All other ROCm components follow the workflow described in the following sections.
## Development workflow
ROCm uses GitHub to host code, collaborate, and manage version control. We use pull requests (PRs)
for all changes within our repositories. We use
[GitHub issues](https://github.com/ROCm/ROCm/issues) to track known issues, such as
bugs.
### Issue tracking
Before filing a new issue, search the
[existing issues](https://github.com/ROCm/ROCm/issues) to make sure your issue isn't
already listed.
General issue guidelines:
* Use your best judgement for issue creation. If your issue is already listed, upvote the issue and
comment or post to provide additional details, such as how you reproduced this issue.
* If you're not sure if your issue is the same, err on the side of caution and file your issue.
You can add a comment to include the issue number (and link) for the similar issue. If we evaluate
your issue as being the same as the existing issue, we'll close the duplicate.
* If your issue doesn't exist, use the issue template to file a new issue.
* When filing an issue, be sure to provide as much information as possible, including script output so
we can collect information about your configuration. This helps reduce the time required to
reproduce your issue.
* Check your issue regularly, as we may require additional information to successfully reproduce the
issue.
### Pull requests
When you create a pull request, you should target the default branch. Our repositories typically use the **develop** branch as the default integration branch.
When creating a PR, use the following process. Note that each repository may include additional,
project-specific steps. Refer to each repository's PR process for any additional steps.
* Identify the issue you want to fix
* Target the default branch (usually the **develop** branch) for integration
* Ensure your code builds successfully
* Each component has a suite of test cases to run; include the log of the successful test run in your PR
* Do not break existing test cases
* New functionality is only merged with new unit tests
* If your PR includes a new feature, you must provide an application or test so we can ensure that the
feature works and continues to be valid in the future
* Tests must have good code coverage
* Submit your PR and work with the reviewer or maintainer to get your PR approved
* Once approved, the PR is brought onto internal CI systems and may be merged into the component
during our release cycle, as coordinated by the maintainer
* We'll inform you once your change is committed
:::{important}
By creating a PR, you agree to allow your contribution to be licensed under the
terms of the LICENSE.txt file in the corresponding repository. Different repositories may use different
licenses.
:::
You can look up each license on the [ROCm licensing](https://rocm.docs.amd.com/en/latest/about/license.html) page.
### New feature development
Use the [GitHub Discussion forum](https://github.com/ROCm/ROCm/discussions)
(Ideas category) to propose new features. Our maintainers are happy to provide direction and
feedback on feature development.
### Documentation
Submit ROCm documentation changes to our
[documentation repository](https://github.com/ROCm/ROCm). You must update
documentation related to any new feature or API contribution.
Note that each ROCm project uses its own repository for documentation.
## Future development workflow
The current ROCm development workflow is GitHub-based. If, in the future, we change this platform,
the tools and links may change. In this instance, we will update contribution guidelines accordingly.

60
GOVERNANCE.md Normal file
View File

@@ -0,0 +1,60 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm governance model">
<meta name="keywords" content="ROCm, governance">
</head>
# Governance model
ROCm is a software stack made up of a collection of drivers, development tools, and APIs that enable
GPU programming from the low-level kernel to end-user applications.
Components of ROCm that are inherited from external projects (such as
[LLVM](https://github.com/ROCm/llvm-project) and
[Kernel driver](https://github.com/ROCm/ROCK-Kernel-Driver)) follow their own
governance model and code of conduct. All other components of ROCm are governed by this
document.
## Governance
ROCm is led and managed by AMD.
We welcome contributions from the community. Our maintainers review all proposed changes to
ROCm.
## Roles
* **Maintainers** are responsible for their designated component and repositories.
* **Contributors** provide input and suggest changes to existing components.
### Maintainers
Maintainers are appointed by AMD. They are able to approve changes and can commit to our
repositories. They must use pull requests (PRs) for all changes.
You can find the list of maintainers in the CODEOWNERS file of each repository. Code owners differ
between repositories.
### Contributors
If you're not a maintainer, you're a contributor. We encourage the ROCm community to contribute in
several ways:
* Help other community members by posting questions or solutions on our
[GitHub discussion forums](https://github.com/ROCm/ROCm/discussions)
* Notify us of a bugs by filing an issue report on
[GitHub Issues](https://github.com/ROCm/ROCm/issues)
* Improve our documentation by submitting a PR to our
[repository](https://github.com/ROCm/ROCm/)
* Improve the code base (for smaller or contained changes) by submitting a PR to the component
* Suggest larger features by adding to the *Ideas* category in the
[GitHub discussion forum](https://github.com/ROCm/ROCm/discussions)
For more information, refer to our [contribution guidelines](CONTRIBUTING.md).
## Code of conduct
To engage with any AMD ROCm component that is hosted on GitHub, you must abide by the
[GitHub community guidelines](https://docs.github.com/en/site-policy/github-terms/github-community-guidelines)
and the
[GitHub community code of conduct](https://docs.github.com/en/site-policy/github-terms/github-community-code-of-conduct).

View File

@@ -10,7 +10,7 @@ ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
ROCm is powered by AMDs
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm/HIP),
an open-source software C++ GPU programming environment and its corresponding runtime. HIP
allows ROCm developers to create portable applications on different platforms by deploying code on a
range of platforms, from dedicated gaming GPUs to exascale HPC clusters.
@@ -19,28 +19,70 @@ ROCm supports programming models, such as OpenMP and OpenCL, and includes all ne
source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
(ML) frameworks, such as PyTorch and TensorFlow.
## Getting the ROCm Source Code
AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains a repo manifest file called [default.xml](./default.xml). You can use this manifest file to download the source code for ROCm software.
### Installing the repo tool
The repo tool from Google allows you to manage multiple git repositories simultaneously. Run the following commands to install the repo tool:
```bash
mkdir -p ~/bin/
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
chmod a+x ~/bin/repo
```
**Note:** The ```~/bin/``` folder is used as an example. You can specify a different folder to install the repo tool into if you desire.
### Installing git-lfs
Some ROCm projects use the Git Large File Storage (LFS) format that may require you to install git-lfs. Refer to [Git Large File Storage](https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md) for more information. For example, to install git-lfs for Ubuntu, use the following command:
```bash
sudo apt-get install git-lfs
```
### Downloading the ROCm source code
The following example shows how to use the repo tool to download the ROCm source code. If you choose a directory other than ~/bin/ to install the repo tool, you must use that chosen directory in the code as shown below:
```bash
mkdir -p ~/ROCm/
cd ~/ROCm/
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.0.x
~/bin/repo sync
```
**Note:** Using this sample code will cause the repo tool to download the open source code associated with the specified ROCm release. Ensure that you have ssh-keys configured on your machine for your GitHub ID prior to the download as explained at [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh).
### Building the ROCm source code
Each ROCm component repository contains directions for building that component, such as the rocSPARSE documentation [Installation and Building for Linux](https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/install/Linux_Install_Guide.html). Refer to the specific component documentation for instructions on building the repository.
Each release of the ROCm software supports specific hardware and software configurations. Refer to [System requirements (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) for the current supported hardware and OS.
## ROCm documentation
This repository contains the manifest file for ROCm releases, changelogs, and release information.
This repository contains the [manifest file](https://gerrit.googlesource.com/git-repo/+/HEAD/docs/manifest-format.md)
for ROCm releases, changelogs, and release information.
The `default.xml` file contains information for all repositories and the associated commit used to build
the current ROCm release; `default.xml` uses the Manifest Format repository.
the current ROCm release; `default.xml` uses the [Manifest Format repository](https://gerrit.googlesource.com/git-repo/).
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
`develop` branch of our repositories contains content for the next ROCm release.
The ROCm documentation homepage is [rocm.docs.amd.com](https://rocm.docs.amd.com).
### Building our documentation
### Building the documentation
For a quick-start build, use the following code. For more options and detail, refer to
[Building documentation](./contribute/building.md).
[Building documentation](./docs/contribute/building.md).
```bash
cd docs
pip3 install -r sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
@@ -48,7 +90,6 @@ Alternatively, CMake build is supported.
```bash
cmake -B build
cmake --build build --target=doc
```

View File

@@ -1,248 +1,252 @@
# Release notes for AMD ROCm™ 6.0
# ROCm 6.1 release highlights
<!-- Disable lints since this is an auto-generated file. -->
<!-- markdownlint-disable blanks-around-headers -->
<!-- markdownlint-disable no-duplicate-header -->
<!-- markdownlint-disable no-blanks-blockquote -->
<!-- markdownlint-disable ul-indent -->
<!-- markdownlint-disable no-trailing-spaces -->
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
MI300 series.Future releases will further enable and optimize this new platform. Key features include:
<!-- spellcheck-disable -->
* Improved performance in areas like lower precision math and attention layers.
* New hipSPARSELt library accelerates AI workloads via AMD's sparse matrix core technique.
* Upstream support is now available for popular AI frameworks like TensorFlow, JAX, and PyTorch.
* New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
* Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
tutorials on the [AMD ROCm Docs](https://rocm.docs.amd.com) site.
* Consolidated developer resources and training on the new
[AMD ROCm Developer Hub](https://www.amd.com/en/developer/resources/rocm-hub.html).
The ROCm™ 6.1 release consists of new features and fixes to improve the stability and
performance of AMD Instinct™ MI300 GPU applications. Notably, we've added:
The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html). We list known
issues on [GitHub](https://github.com/ROCm/ROCm/issues).
* Full support for Ubuntu 22.04.4.
* **rocDecode**, a new ROCm component that provides high-performance video decode support for
AMD GPUs. With rocDecode, you can decode compressed video streams while keeping the resulting
YUV frames in video memory. With decoded frames in video memory, you can run video
post-processing using ROCm HIP, avoiding unnecessary data copies via the PCIe bus.
To learn more, refer to the rocDecode
[documentation](https://rocm.docs.amd.com/projects/rocDecode/en/latest/).
## OS and GPU support changes
ROCm 6.0 enables the use of MI300A and MI300X Accelerators with a limited operating systems
support. Future releases will add additional OS's to match our general offering.
ROCm 6.1 adds the following operating system support:
| Operating Systems | MI300A | MI300X |
|:---:|:---:|:---:|
| Ubuntu 22.04.5 | Supported | Supported |
| RHEL 8.9 | Supported | |
| SLES15 SP5 | Supported | |
* MI300A: Ubuntu 22.04.4 and RHEL 9.3
* MI300X: Ubuntu 22.04.4
For older generations of supported Instinct products we've added the following operating systems:
Future releases will add additional operating systems to match the general offering. For older
generations of supported AMD Instinct products, weve added Ubuntu 22.04.4 support.
* RHEL 9.3
* RHEL 8.9
```{tip}
To view the complete list of supported GPUs and operating systems, refer to the system requirements
page for
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html)
and
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html).
```
Note: For ROCm 6.2 and beyond, we've planned for end-of-support (EoS) for the following operating
systems:
## Installation packages
* Ubuntu 20.04.5
* SLES 15 SP4
* RHEL/CentOS 7.9
This release includes a new set of packages for every module (all libraries and binaries default to
`DT_RPATH`). Package names have the suffix `rpath`; for example, the `rpath` variant of `rocminfo` is
`rocminfo-rpath`.
## New ROCm meta package
```{warning}
The new `rpath` packages will conflict with the default packages; they are meant to be used only in
environments where legacy `DT_RPATH` is the preferred form of linking (instead of `DT_RUNPATH`). We
do **not** recommend installing both sets of packages.
```
We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
libraries. For example, the following command will install the full ROCm package: `apt-get install rocm`
(Ubuntu), or `yum install rocm` (RHEL).
## ROCm components
## Filesystem Hierarchy Standard
The following sections highlight select component-specific changes. For additional details, refer to the
[Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html).
ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
the backward compatibility support for old file locations.
### AMD System Management Interface (SMI) Tool
## Compiler location change
* **New monitor command for GPU metrics**.
Use the monitor command to customize, capture, collect, and observe GPU metrics on
target devices.
* The installation path of LLVM has been changed from `/opt/rocm-<rel>/llvm` to
`/opt/rocm-<rel>/lib/llvm`. For backward compatibility, a symbolic link is provided to the old
location and will be removed in a future release.
* The installation path of the device library bitcode has changed from `/opt/rocm-<rel>/amdgcn` to
`/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn`. For backward compatibility, a symbolic link
is provided and will be removed in a future release.
* **Integration with E-SMI**.
The EPYC™ System Management Interface In-band Library is a Linux C-library that provides in-band
user space software APIs to monitor and control your CPUs power, energy, performance, and other
system management functionality. This integration enables access to CPU metrics and telemetry
through the AMD SMI API and CLI tools.
## Documentation
### Composable Kernel (CK)
CMake support has been added for documentation in the
[ROCm repository](https://github.com/RadeonOpenCompute/ROCm).
* **New architecture support**.
CK now supports to the following architectures to enable efficient image denoising on the following
AMD GPUs: gfx1030, gfx1100, gfx1031, gfx1101, gfx1032, gfx1102, gfx1034, gfx1103, gfx1035,
gfx1036
## AMD Instinct™ MI50 end-of-support notice
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
maintenance mode in ROCm 6.0.
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 was the
final release for gfx906 GPUs in a fully supported state.
* Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
* Bug fixes will be made up to the next ROCm point release.
* Bug fixes will not be backported to older ROCm releases for gfx906.
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906
GPUs until EOM.
## ROCm projects
The following sections contains project-specific release notes for ROCm 6.0. For additional details, you
can refer to the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html).
### AMD SMI
* **Integrated the E-SMI (EPYC-SMI) library**.
You can now query CPU-related information directly through AMD SMI. Metrics include power,
energy, performance, and other system details.
* **Added support for gfx942 metrics**.
You can now query MI300 device metrics to get real-time information. Metrics include power,
temperature, energy, and performance.
* **FP8 rounding logic is replaced with stochastic rounding**.
Stochastic rounding mimics a more realistic data behavior and improves model convergence.
### HIP
* **New features to improve resource interoperability**.
* For external resource interoperability, we've added new structs and enums.
* We've added new members to HIP struct `hipDeviceProp_t` for surfaces, textures, and device
identifiers.
* **New environment variable to enable kernel run serialization**.
The default `HIP_LAUNCH_BLOCKING` value is `0` (disable); which causes kernels to run as defined in
the queue. When set to `1` (enable), the HIP runtime serializes the kernel queue, which behaves the
same as `AMD_SERIALIZE_KERNEL`.
* **Changes impacting backward compatibility**.
There are several changes impacting backward compatibility: we changed some struct members and
some enum values, and removed some deprecated flags. For additional information, please refer to
the Changelog.
### hipBLASLt
### hipCUB
* **New GemmTuning extension parameter** GemmTuning allows you to set a split-k value for each solution, which is more feasible for
performance tuning.
* **Additional CUB API support**.
The hipCUB backend is updated to CUB and Thrust 2.1.
### hipFFT
* **New multi-GPU support for single-process transforms** Multiple GPUs can be used to perform a transform in a single process. Note that this initial
implementation is a functional preview.
### HIPIFY
* **Enhanced CUDA2HIP document generation**.
API versions are now listed in the CUDA2HIP documentation. To see if the application binary
interface (ABI) has changed, refer to the
[*C* column](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Runtime_API_functions_supported_by_HIP.html)
in our API documentation.
* **Skipped code blocks**: Code blocks that are skipped by the preprocessor are no longer hipified under the
`--default-preprocessor` option. To hipify everything, despite conditional preprocessor directives
(`#if`, `#ifdef`, `#ifndef`, `#elif`, or `#else`), don't use the `--default-preprocessor` or `--amap` options.
* **Hipified rocSPARSE**.
We've implemented support for the direct hipification of additional cuSPARSE APIs into rocSPARSE
APIs under the `--roc` option. This covers a major milestone in the roadmap towards complete
cuSPARSE-to-rocSPARSE hipification.
### hipSPARSELt
### hipRAND
* **Official release**.
hipRAND is now a *standalone project*--it's no longer available as a submodule for rocRAND.
* **Structured sparsity matrix support extensions**
Structured sparsity matrices help speed up deep-learning workloads. We now support `B` as the
sparse matrix and `A` as the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM). Prior to this
release, we only supported sparse (matrix A) x dense (matrix B) matrix multiplication. Structured
sparsity matrices help speed up deep learning workloads.
### hipTensor
* **Added architecture support**.
We've added contraction support for gfx942 architectures, and f32 and f64 data
types.
* **Upgraded testing infrastructure**.
hipTensor will now support dynamic parameter configuration with input YAML config.
* **4D tensor permutation and contraction support**.
You can now perform tensor permutation on 4D tensors and 4D contractions for F16, BF16, and
Complex F32/F64 datatypes.
### MIGraphX
* **Added TorchMIGraphX**.
We introduced a Dynamo backend for Torch, which allows PyTorch to use MIGraphX directly
without first requiring a model to be converted to the ONNX model format. With a single line of
code, PyTorch users can utilize the performance and quantization benefits provided by MIGraphX.
* **Improved performance for transformer-based models**.
We added support for FlashAttention, which benefits models like BERT, GPT, and Stable Diffusion.
* **Boosted overall performance with rocMLIR**.
We've integrated the rocMLIR library for ROCm-supported RDNA and CDNA GPUs. This
technology provides MLIR-based convolution and GEMM kernel generation.
* **New Torch-MIGraphX driver**.
This driver calls MIGraphX directly from PyTorch. It provides an `mgx_module` object that you can
invoke like any other Torch module, but which utilizes the MIGraphX inference engine internally.
Torch-MIGraphX supports FP32, FP16, and INT8 datatypes.
* **Added INT8 support across the MIGraphX portfolio**.
We now support the INT8 data type. MIGraphX can perform the quantization or ingest
prequantized models. INT8 support extends to the MIGraphX execution provider for ONNX Runtime.
* **FP8 support**. We now offer functional support for inference in the FP8E4M3FNUZ datatype. You
can load an ONNX model in FP8E4M3FNUZ using C++ or Python APIs, or `migraphx-driver`.
You can quantize a floating point model to FP8 format by using the `--fp8` flag with `migraphx-driver`.
To accelerate inference, MIGraphX uses hardware acceleration on MI300 for FP8 by leveraging FP8
support in various backend kernel libraries.
### ROCgdb
### MIOpen
* **Added support for additional GPU architectures**.
* Navi 3 series: gfx1100, gfx1101, and gfx1102.
* MI300 series: gfx942.
* **Improved performance for inference and convolutions**.
Inference support now provided for Find 2.0 fusion plans. Additionally, we've enhanced the Number of
samples, Height, Width, and Channels (NHWC) convolution kernels for heuristics. NHWC stores data
in a format where the height and width dimensions come first, followed by channels.
### rocm-smi-lib
### OpenMP
* **Improved accessibility to GPU partition nodes**.
You can now view, set, and reset the compute and memory partitions. You'll also get notifications of
a GPU busy state, which helps you avoid partition set or reset failure.
* **Implicit Zero-copy is triggered automatically in XNACK-enabled MI300A systems**.
Implicit Zero-copy behavior in `non unified_shared_memory` programs is triggered automatically in
XNACK-enabled MI300A systems (for example, when using the `HSA_XNACK=1` environment
variable). OpenMP supports the 'requires `unified_shared_memory`' directive to support programs
that dont want to copy data explicitly between the CPU and GPU. However, this requires that you add
these directives to every translation unit of the program.
* **Upgraded GPU metrics version 1.4**.
The upgraded GPU metrics binary has an improved metric version format with a content version
appended to it. You can read each metric within the binary without the full `rsmi_gpu_metric_t` data
structure.
* **New MI300 FP atomics**. Application performance can now improve by leveraging fast floating-point atomics on MI300 (gfx942).
* **Updated GPU index sorting**.
We made GPU index sorting consistent with other ROCm software tools by optimizing it to use
`Bus:Device.Function` (BDF) instead of the card number.
### RCCL
* **NCCL 2.18.6 compatibility**.
RCCL is now compatible with NCCL 2.18.6, which includes increasing the maximum IB network interfaces to 32 and fixing network device ordering when creating communicators with only one GPU
per node.
* **Doubled simultaneous communication channels**.
We improved MI300X performance by increasing the maximum number of simultaneous
communication channels from 32 to 64.
### rocALUTION
* **New multiple node and GPU support**.
Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
and GPUs. For more information, refer to the
[API documentation](https://rocm.docs.amd.com/projects/rocALUTION/en/latest/usermanual/solvers.html#unsmoothed-aggregation-amg).
### rocDecode
* **New ROCm component**.
rocDecode ROCm's newest component, providing high-performance video decode support for AMD
GPUs. To learn more, refer to the
[documentation](https://rocm.docs.amd.com/projects/rocDecode/en/latest/).
### ROCm Compiler
* **Added kernel argument optimization on gfx942**.
With the new feature, you can preload kernel arguments into Scalar General-Purpose Registers
(SGPRs) rather than pass them in memory. This feature is enabled with a compiler option, which also
controls the number of arguments to pass in SGPRs. For more information, see:
[https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments](https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments)
* **Combined projects**. ROCm Device-Libs, ROCm Compiler Support, and hipCC are now located in
the `llvm-project/amd` subdirectory of AMD's fork of the LLVM project. Previously, these projects
were maintained in separate repositories. Note that the projects themselves will continue to be
packaged separately.
* **Improved register allocation at -O0**.
We've improved the register allocator used at -O0 to avoid compiler crashes (when the signature is
'ran out of registers during register allocation').
* **Split the 'rocm-llvm' package**. This package has been split into a required and an optional package:
* **Improved generation of debug information**.
We've improved compile time when generating debug information for certain corner cases. We've
also improved the compiler to eliminate compiler crashes when generating debug information.
* **rocm-llvm(required)**: A package containing the essential binaries needed for compilation.
* **rocm-llvm-dev(optional)**: A package containing binaries for compiler and application developers.
### ROCmValidationSuite
### ROCm Data Center Tool (RDC)
* **Added GPU and operating system support**.
We added support for MI300X GPU in GPU Stress Test (GST).
* **C++ upgrades**.
RDC was upgraded from C++11 to C++17 to enable a more modern C++ standard when writing RDC plugins.
### Roc Profiler
### ROCm Performance Primitives (RPP)
* **Added option to specify desired Roc Profiler version**.
You can now use rocProfV1 or rocProfV2 by specifying your desired version, as the legacy rocProf
(`rocprofv1`) provides the option to use the latest version (`rocprofv2`).
* **New backend support**.
Audio processing support added for the `HOST` backend and 3D Voxel kernels support
for the `HOST` and `HIP` backends.
* **Automated the ISA dumping process by Advance Thread Tracer**.
Advance Thread Tracer (ATT) no longer depends on user-supplied Instruction Set Architecture (ISA)
and compilation process (using ``hipcc --save-temps``) to dump ISA from the running kernels.
### ROCm Validation Suite
* **Added ATT support for parallel kernels**.
The automatic ISA dumping process also helps ATT successfully parse multiple kernels running in
parallel, and provide cycle-accurate occupancy information for multiple kernels at the same time.
* **New datatype support**.
Added BF16 and FP8 datatypes based on General Matrix Multiply(GEMM) operations in the GPU Stress Test (GST) module. This provides additional performance benchmarking and stress testing based on the newly supported datatypes.
### ROCr
### rocSOLVER
* **Support for SDMA link aggregation**.
If multiple XGMI links are available when making SDMA copies between GPUs, the copy is
distributed over multiple links to increase peak bandwidth.
* **New EigenSolver routine**.
Based on the Jacobi algorithm, a new EigenSolver routine was added to the library. This routine computes the eigenvalues and eigenvectors of a matrix with improved performance.
### rocThrust
### ROCTracer
* **Added Thrust 2.1 API support**.
rocThrust backend is updated to Thrust and CUB 2.1.
* **New versioning and callback enhancements**.
Improved to match versioning changes in HIP Runtime and supports runtime API callbacks and activity record logging. The APIs of different runtimes at different levels are considered different API domains with assigned domain IDs.
### rocWMMA
## Upcoming changes
* **Added new architecture support**.
We added support for gfx942 architectures.
* ROCm SMI will be deprecated in a future release. We advise **migrating to AMD SMI** now to
prevent future workflow disruptions.
* **Added data type support**.
We added support for f8, bf8, xf32 data types on supporting architectures, and for bf16 in the HIP RTC
environment.
* hipCC supports, by default, the following compiler invocation flags:
* **Added support for the PyTorch kernel plugin**.
We added awareness of `__HIP_NO_HALF_CONVERSIONS__` to support PyTorch users.
* `-mllvm -amdgpu-early-inline-all=true`
* `-mllvm -amdgpu-function-calls=false`
### TransferBench
In a future ROCm release, hipCC will no longer support these flags. It will, instead, use the Clang
defaults:
* **Improved ordering control**.
You can now set the thread block size (`BLOCK_SIZE`) and the thread block order (`BLOCK_ORDER`)
in which thread blocks from different transfers are run when using a single stream.
* `-mllvm -amdgpu-early-inline-all=false`
* `-mllvm -amdgpu-function-calls=true`
* **Added comprehensive reports**.
We modified individual transfers to report X Compute Clusters (XCC) ID when `SHOW_ITERATIONS`
is set to 1.
To evaluate the impact of this change, include `--hipcc-func-supp` in your hipCC invocation.
* **Improved accuracy in result validation**.
You can now validate results for each iteration instead of just once for all iterations.
For information on these flags, and the differences between hipCC and Clang, refer to
[ROCm Compiler Interfaces](https://rocm.docs.amd.com/en/latest/reference/rocmcc.html#rocm-compiler-interfaces).
* Future ROCm releases will not provide `clang-ocl`. For more information, refer to the
[`clang-ocl` README](https://github.com/ROCm/clang-ocl).
* The following operating systems will be supported in a future ROCm release. They are currently
only available in beta.
* RHEL 9.4
* RHEL 8.10
* SLES 15 SP6
* As of ROCm 6.2, weve planned for **end-of-support** for:
* Ubuntu 20.04.5
* SLES 15 SP4
* RHEL/CentOS 7.9

View File

@@ -26,7 +26,7 @@
include(FetchContent)
if(ROCM_BUILD_DOCS)
if(BUILD_DOCS)
find_package(ROCM 0.11.0 CONFIG QUIET PATHS "${ROCM_PATH}") # First version with Sphinx doc gen improvement
if(NOT ROCM_FOUND)
message(STATUS "ROCm CMake not found. Fetching...")
@@ -35,7 +35,7 @@ if(ROCM_BUILD_DOCS)
CACHE STRING "rocm-cmake tag to download")
FetchContent_Declare(
rocm-cmake
GIT_REPOSITORY https://github.com/RadeonOpenCompute/rocm-cmake.git
GIT_REPOSITORY https://github.com/ROCm/rocm-cmake.git
GIT_TAG ${rocm_cmake_tag}
SOURCE_SUBDIR "DISABLE ADDING TO BUILD" # We don't really want to consume the build and test targets of ROCm CMake.
)
@@ -44,58 +44,4 @@ if(ROCM_BUILD_DOCS)
else()
find_package(ROCM 0.11.0 CONFIG REQUIRED PATHS "${ROCM_PATH}")
endif()
if(Python_FIND_VIRTUALENV STREQUAL "ONLY" AND NOT DEFINED ENV{VIRTUAL_ENV})
if(NOT EXISTS "${CMAKE_CURRENT_BINARY_DIR}/.venv")
message(STATUS "Python virtualenv use requested but not found. Fetching...")
find_program(BOOTSTRAP_PYTHON_EXE python3 REQUIRED)
execute_process(
COMMAND "${BOOTSTRAP_PYTHON_EXE}" -m pip install --user virtualenv
OUTPUT_QUIET
COMMAND_ERROR_IS_FATAL ANY
)
execute_process(
COMMAND "${BOOTSTRAP_PYTHON_EXE}" -m virtualenv "${CMAKE_CURRENT_BINARY_DIR}/.venv"
OUTPUT_QUIET
COMMAND_ERROR_IS_FATAL ANY
)
endif()
set(ENV{VIRTUAL_ENV} "${CMAKE_CURRENT_BINARY_DIR}/.venv")
if(WIN32)
set(ENV{PATH} "${CMAKE_CURRENT_BINARY_DIR}/.venv/Scripts;$ENV{PATH}")
else()
set(ENV{PATH} "${CMAKE_CURRENT_BINARY_DIR}/.venv/bin:$ENV{PATH}")
endif()
find_package(Python REQUIRED)
# TODO: shortcircuit if installed
execute_process(
COMMAND "${Python_EXECUTABLE}" -m pip install pip-tools
OUTPUT_QUIET
COMMAND_ERROR_IS_FATAL ANY
)
list(APPEND CMAKE_CONFIGURE_DEPENDS "${PROJECT_SOURCE_DIR}/docs/sphinx/requirements.in")
file(MAKE_DIRECTORY "$ENV{VIRTUAL_ENV}/usr/share/${PROJECT_NAME}")
if("${PROJECT_SOURCE_DIR}/docs/sphinx/requirements.in" IS_NEWER_THAN "$ENV{VIRTUAL_ENV}/usr/share/${PROJECT_NAME}/requirements.txt")
execute_process(
COMMAND "${Python_EXECUTABLE}" -m piptools compile
"${PROJECT_SOURCE_DIR}/docs/sphinx/requirements.in"
--output-file
"$ENV{VIRTUAL_ENV}/usr/share/${PROJECT_NAME}/requirements.txt"
OUTPUT_QUIET
ERROR_QUIET
COMMAND_ERROR_IS_FATAL ANY
)
execute_process(
COMMAND "${Python_EXECUTABLE}" -m piptools sync
"$ENV{VIRTUAL_ENV}/usr/share/${PROJECT_NAME}/requirements.txt"
OUTPUT_QUIET
ERROR_QUIET
COMMAND_ERROR_IS_FATAL ANY
)
endif()
endif()
endif()

View File

@@ -1,74 +1,69 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<remote name="roc-github" fetch="https://github.com/RadeonOpenCompute/" />
<remote name="rocm-devtools" fetch="https://github.com/ROCm-Developer-Tools/" />
<remote name="rocm-swplat" fetch="https://github.com/ROCmSoftwarePlatform/" />
<remote name="gpuopen-libs" fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
<remote name="gpuopen-tools" fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup" fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-6.0.0"
<default revision="refs/tags/rocm-6.1.0"
remote="rocm-org"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCm-->
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" />
<project name="ROCK-Kernel-Driver" />
<project name="ROCT-Thunk-Interface" />
<project name="ROCR-Runtime" />
<project name="ROCT-Thunk-Interface" />
<project name="amdsmi" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocm-cmake" />
<project name="rocminfo" />
<project name="rocm_bandwidth_test" />
<project name="rocprofiler" />
<project name="roctracer" />
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" revision="6c03f8b58fafd9dd693eaac826749a5cfad515f8" />
<project name="clang-ocl" />
<project name="rdc" />
<project name="rocm_bandwidth_test" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocminfo" />
<project name="rocprofiler" />
<project name="rocprofiler-register" />
<project name="roctracer" />
<!--HIP Projects-->
<project name="HIP" />
<project name="HIP-Examples" />
<project name="HIPIFY" />
<project name="clr" />
<project name="hipother" />
<project name="HIPIFY" />
<project name="HIPCC" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="half" />
<project name="llvm-project" />
<project name="ROCm-Device-Libs" />
<project name="ROCm-CompilerSupport" />
<project name="half" revision="37742ce15b76b44e4b271c1e66d13d2fa7bd003e" />
<!-- gdb projects -->
<project name="ROCgdb" />
<project name="ROCdbgapi" />
<project name="ROCgdb" />
<project name="rocr_debug_agent" />
<!-- ROCm Libraries -->
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="AMDMIGraphX" />
<project groups="mathlibs" name="MIOpen" />
<project groups="mathlibs" name="MIVisionX" />
<project groups="mathlibs" name="ROCmValidationSuite" />
<project groups="mathlibs" name="Tensile" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="composable_kernel" />
<project groups="mathlibs" name="hipBLAS" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="hipBLASLt" />
<project groups="mathlibs" name="hipCUB" />
<project groups="mathlibs" name="hipFFT" />
<project groups="mathlibs" name="rocRAND" />
<project groups="mathlibs" name="hipRAND" />
<project groups="mathlibs" name="rocSPARSE" />
<project groups="mathlibs" name="hipSPARSELt" />
<project groups="mathlibs" name="rocSOLVER" />
<project groups="mathlibs" name="hipSOLVER" />
<project groups="mathlibs" name="hipSPARSE" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocThrust" />
<project groups="mathlibs" name="hipCUB" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="hipSPARSELt" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipfort" />
<project groups="mathlibs" name="rccl" />
<project name="MIOpen" />
<project name="composable_kernel" />
<project name="MIVisionX" />
<project name="rpp" />
<project name="hipfort" />
<project name="AMDMIGraphX" />
<project name="ROCmValidationSuite" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="rocDecode" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocRAND" />
<project groups="mathlibs" name="rocSOLVER" />
<project groups="mathlibs" name="rocSPARSE" />
<project groups="mathlibs" name="rocThrust" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="rocm-cmake" />
<project groups="mathlibs" name="rpp" />
<!-- Projects for OpenMP-Extras -->
<project name="aomp" path="openmp-extras/aomp" />
<project name="aomp-extras" path="openmp-extras/aomp-extras" />

View File

@@ -1,122 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Linux GPU and OS support">
<meta name="keywords" content="Linux support, ROCm distributions">
</head>
# GPU and OS support (Linux)
(linux-support)=
## Supported Linux distributions
AMD ROCm™ Platform supports the following Linux distributions.
::::{tab-set}
:::{tab-item} Supported
| Distribution | Processor Architectures | Validated Kernel | Support |
| :----------- | :---------------------: | :--------------: | ------: |
| RHEL 9.2 | x86-64 | 5.14 (5.14.0-284.11.1.el9_2.x86_64) | ✅ |
| RHEL 9.1 | x86-64 | 5.14.0-284.11.1.el9_2.x86_64 | ✅ |
| RHEL 8.8 | x86-64 | 4.18.0-477.el8.x86_64 | ✅ |
| RHEL 8.7 | x86-64 | 4.18.0-425.10.1.el8_7.x86_64 | ✅ |
| SLES 15 SP5 | x86-64 | 5.14.21-150500.53-default | ✅ |
| SLES 15 SP4 | x86-64 | 5.14.21-150400.24.63-default | ✅ |
| Ubuntu 22.04.2 | x86-64 | 5.19.0-45-generic | ✅ |
| Ubuntu 20.04.5 | x86-64 | 5.15.0-75-generic | ✅ |
:::{versionadded} 5.6
* RHEL 8.8 and 9.2 support is added.
* SLES 15 SP5 support is added
:::
:::{tab-item} Unsupported
| Distribution | Processor Architectures | Validated Kernel | Support |
| :----------- | :---------------------: | :--------------: | ------: |
| RHEL 9.0 | x86-64 | 5.14 | ❌ |
| RHEL 8.6 | x86-64 | 5.14 | ❌ |
| SLES 15 SP3 | x86-64 | 5.3 | ❌ |
| Ubuntu 22.04.0 | x86-64 | 5.15 LTS, 5.17 OEM | ❌ |
| Ubuntu 20.04.4 | x86-64 | 5.13 HWE, 5.13 OEM | ❌ |
| Ubuntu 22.04.1 | x86-64 | 5.15 LTS | ❌ |
:::
::::
✅: **Supported** - AMD performs full testing of all ROCm components on distro
GA image.
❌: **Unsupported** - AMD no longer performs builds and testing on these
previously supported distro GA images.
## Virtualization support
ROCm supports virtualization for select GPUs only as shown below.
| Hypervisor | Version | GPU | Validated Guest OS (validated kernel) |
|----------------|----------|-------|----------------------------------------------------------------------------------|
| VMWare | ESXi 8 | MI250 | Ubuntu 20.04 (`5.15.0-56-generic`) |
| VMWare | ESXi 8 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
| VMWare | ESXi 7 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
## Linux-supported GPUs
The table below shows supported GPUs for Instinct™, Radeon Pro™ and Radeon™
GPUs. Please click the tabs below to switch between GPU product lines. If a GPU
is not listed on this table, the GPU is not officially supported by AMD.
:::::{tab-set}
::::{tab-item} AMD Instinct™
:sync: instinct
| Product Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) |Support |
|:------------:|:------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Instinct™ MI250X | CDNA2 | gfx90a | ✅ |
| AMD Instinct™ MI250 | CDNA2 | gfx90a | ✅ |
| AMD Instinct™ MI210 | CDNA2 | gfx90a | ✅ |
| AMD Instinct™ MI100 | CDNA | gfx908 | ✅ |
| AMD Instinct™ MI50 | GCN5.1 | gfx906 | ✅ |
| AMD Instinct™ MI25 | GCN5.0 | gfx900 | ❌ |
::::
::::{tab-item} Radeon Pro™
:sync: radeonpro
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ Pro W7900 | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ Pro W6800 | RDNA2 | gfx1030 | ✅ |
| AMD Radeon™ Pro V620 | RDNA2 | gfx1030 | ✅ |
| AMD Radeon™ Pro VII | GCN5.1 | gfx906 | ✅ |
::::
::::{tab-item} Radeon™
:sync: radeonpro
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:---------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
::::
:::::
### Support status
✅: **Supported** - AMD enables these GPUs in our software distributions for
the corresponding ROCm product.
⚠️: **Deprecated** - Support will be removed in a future release.
❌: **Unsupported** - This configuration is not enabled in our software
distributions.
## CPU support
ROCm requires CPUs that support PCIe™ atomics. Modern CPUs after the release of
1st generation AMD Zen CPU and Intel™ Haswell support PCIe atomics.

View File

@@ -15,7 +15,8 @@ Along with host APIs, the OpenMP compilers support offloading code and data onto
GPU devices. This document briefly describes the installation location of the
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
with OpenMP applications. The GPUs supported are the same as those supported by
this ROCm release. See the list of supported GPUs for [Linux](../../about/compatibility/linux-support.md) and [Windows](../../about/compatibility/windows-support.md).
this ROCm release. See the list of supported GPUs for {doc}`Linux<rocm-install-on-linux:reference/system-requirements>` and
{doc}`Windows<rocm-install-on-windows:reference/system-requirements>`.
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
The following image illustrates the internal steps taken to translate a users application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
@@ -47,10 +48,10 @@ cd $ROCM_PATH/share/openmp-extras/examples/openmp/veccopy
sudo make run
```
```{note}
:::{note}
`sudo` is required since we are building inside the `/opt` directory.
Alternatively, copy the files to your home directory first.
```
:::
The above invocation of Make compiles and runs the program. Note the options
that are required for target offload from an OpenMP program:
@@ -59,13 +60,15 @@ that are required for target offload from an OpenMP program:
-fopenmp --offload-arch=<gpu-arch>
```
```{note}
:::{note}
The compiler also accepts the alternative offloading notation:
```bash
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
```
:::
Obtain the value of `gpu-arch` by running the following command:
```bash
@@ -75,7 +78,7 @@ Obtain the value of `gpu-arch` by running the following command:
[//]: # (dated link below, needs updating)
See the complete list of compiler command-line references
[here](https://github.com/RadeonOpenCompute/llvm-project/blob/amd-stg-open/clang/docs/CommandGuide/clang.rst).
[here](https://github.com/ROCm/llvm-project/blob/amd-stg-open/clang/docs/CommandGuide/clang.rst).
### Using `rocprof` with OpenMP
@@ -327,10 +330,10 @@ double a = 0.0;
a = a + 1.0;
```
```{note}
:::{note}
`AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
`AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
```
:::
To disable the generation of fast floating-point atomic instructions at the file
level, build using the option `-msafe-fp-atomics` or use a hint clause on a
@@ -410,7 +413,7 @@ void main() {
```
See the complete sample code for heap buffer overflow
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
[here](https://github.com/ROCm/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
* Global buffer overflow
@@ -435,7 +438,7 @@ for(int i=0; i<N; i++){
```
See the complete sample code for global buffer overflow
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
[here](https://github.com/ROCm/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
### Clang compiler option for kernel optimization

View File

@@ -0,0 +1,565 @@
.. meta::
:description: Supported data types in ROCm
:keywords: int8, float8, float8 (E4M3), float8 (E5M2), bfloat8, float16, half, bfloat16, tensorfloat32, float,
float32, float64, double, AMD, ROCm, AMDGPU
*************************************************************
Precision support
*************************************************************
Use the following sections to identify data types and HIP types ROCm™ supports.
Integral types
==========================================
The signed and unsigned integral types that are supported by ROCm are listed in the following table,
together with their corresponding HIP type and a short description.
.. list-table::
:header-rows: 1
:widths: 15,35,50
*
- Type name
- HIP type
- Description
*
- int8
- ``int8_t``, ``uint8_t``
- A signed or unsigned 8-bit integer
*
- int16
- ``int16_t``, ``uint16_t``
- A signed or unsigned 16-bit integer
*
- int32
- ``int32_t``, ``uint32_t``
- A signed or unsigned 32-bit integer
*
- int64
- ``int64_t``, ``uint64_t``
- A signed or unsigned 64-bit integer
Floating-point types
==========================================
The floating-point types that are supported by ROCm are listed in the following table, together with
their corresponding HIP type and a short description.
.. image:: ../../data/about/compatibility/floating-point-data-types.png
:alt: Supported floating-point types
.. list-table::
:header-rows: 1
:widths: 15,15,70
*
- Type name
- HIP type
- Description
*
- float8 (E4M3)
- ``-``
- An 8-bit floating-point number that mostly follows IEEE-754 conventions and **S1E4M3** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
*
- float8 (E5M2)
- ``-``
- An 8-bit floating-point number mostly following IEEE-754 conventions and **S1E5M2** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
*
- float16
- ``half``
- A 16-bit floating-point number that conforms to the IEEE 754-2008 half-precision storage format.
*
- bfloat16
- ``bfloat16``
- A shortened 16-bit version of the IEEE 754 single-precision storage format.
*
- tensorfloat32
- ``-``
- A floating-point number that occupies 32 bits or less of storage, providing improved range compared to half (16-bit) format, at (potentially) greater throughput than single-precision (32-bit) formats.
*
- float32
- ``float``
- A 32-bit floating-point number that conforms to the IEEE 754 single-precision storage format.
*
- float64
- ``double``
- A 64-bit floating-point number that conforms to the IEEE 754 double-precision storage format.
.. note::
* The float8 and tensorfloat32 types are internal types used in calculations in Matrix Cores and can be stored in any type of the same size.
* The encodings for FP8 (E5M2) and FP8 (E4M3) that are natively supported by MI300 differ from the FP8 (E5M2) and FP8 (E4M3) encodings used in H100 (`FP8 Formats for Deep Learning <https://arxiv.org/abs/2209.05433>`_).
* In some AMD documents and articles, float8 (E5M2) is referred to as bfloat8.
ROCm support icons
==========================================
In the following sections, we use icons to represent the level of support. These icons, described in the
following table, are also used on the library data type support pages.
.. list-table::
:header-rows: 1
*
- Icon
- Definition
*
-
- Not supported
*
- ⚠️
- Partial support
*
-
- Full support
.. note::
* Full support means that the type is supported natively or with hardware emulation.
* Native support means that the operations for that type are implemented in hardware. Types that are not natively supported are emulated with the available hardware. The performance of non-natively supported types can differ from the full instruction throughput rate. For example, 16-bit integer operations can be performed on the 32-bit integer ALUs at full rate; however, 64-bit integer operations might need several instructions on the 32-bit integer ALUs.
* Any type can be emulated by software, but this page does not cover such cases.
Hardware type support
==========================================
AMD GPU hardware support for data types is listed in the following tables.
Compute units support
-------------------------------------------------------------------------------
The following table lists data type support for compute units.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
Matrix core support
-------------------------------------------------------------------------------
The following table lists data type support for AMD GPU matrix cores.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
Atomic operations support
-------------------------------------------------------------------------------
The following table lists data type support for atomic operations.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
.. note::
For cases that are not natively supported, you can emulate atomic operations using software.
Software-emulated atomic operations have high negative performance impact when they frequently
access the same memory address.
Data Type support in ROCm Libraries
==========================================
ROCm library support for int8, float8 (E4M3), float8 (E5M2), int16, float16, bfloat16, int32,
tensorfloat32, float32, int64, and float64 is listed in the following tables.
Libraries input/output type support
-------------------------------------------------------------------------------
The following tables list ROCm library support for specific input and output data types. For a detailed
description, refer to the corresponding library data type support page.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Library input/output data type name
- int8
- int16
- int32
- int64
*
- hipSPARSELt (:doc:`details <hipsparselt:reference/data-type-support>`)
- ✅/✅
- ❌/❌
- ❌/❌
- ❌/❌
*
- rocRAND (:doc:`details <rocrand:data-type-support>`)
- -/✅
- -/✅
- -/✅
- -/✅
*
- hipRAND (:doc:`details <hiprand:data-type-support>`)
- -/✅
- -/✅
- -/✅
- -/✅
*
- rocPRIM (:doc:`details <rocprim:reference/data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details <hipcub:data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
*
- rocThrust (:doc:`details <rocthrust:data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Library input/output data type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- hipSPARSELt (:doc:`details <hipsparselt:reference/data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ❌/❌
- ❌/❌
*
- rocRAND (:doc:`details <rocrand:data-type-support>`)
- -/❌
- -/❌
- -/✅
- -/❌
- -/❌
- -/✅
- -/✅
*
- hipRAND (:doc:`details <hiprand:data-type-support>`)
- -/❌
- -/❌
- -/✅
- -/❌
- -/❌
- -/✅
- -/✅
*
- rocPRIM (:doc:`details <rocprim:reference/data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details <hipcub:data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ✅/✅
- ✅/✅
*
- rocThrust (:doc:`details <rocthrust:data-type-support>`)
- ❌/❌
- ❌/❌
- ⚠️/⚠️
- ⚠️/⚠️
- ❌/❌
- ✅/✅
- ✅/✅
Libraries internal calculations type support
-------------------------------------------------------------------------------
The following tables list ROCm library support for specific internal data types. For a detailed
description, refer to the corresponding library data type support page.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Library internal data type name
- int8
- int16
- int32
- int64
*
- hipSPARSELt (:doc:`details <hipsparselt:reference/data-type-support>`)
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Library internal data type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- hipSPARSELt (:doc:`details <hipsparselt:reference/data-type-support>`)
-
-
-
-
-
-
-

View File

@@ -1,86 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Windows GPU and OS support">
<meta name="keywords" content="Windows support, ROCm distributions">
</head>
# GPU and OS support (Windows)
(windows-support)=
## Supported SKUs
AMD HIP SDK supports the following Windows variants.
| Distribution |Processor Architectures| Validated update |
|---------------------|-----------------------|--------------------|
| Windows 10 | x86-64 | 22H2 (GA) |
| Windows 11 | x86-64 | 22H2 (GA) |
| Windows Server 2022 | x86-64 | |
## Windows-supported GPUs
The table below shows supported GPUs for Radeon Pro™ and Radeon™ GPUs. Please
click the tabs below to switch between GPU product lines. If a GPU is not listed
on this table, the GPU is not officially supported by AMD.
::::{tab-set}
:::{tab-item} Radeon Pro™
:sync: radeonpro
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon Pro™ W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro™ W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro™ W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon Pro™ W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon Pro™ W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
| AMD Radeon Pro™ VII | GCN5.1 | gfx906 | ❌ | ❌ |
:::
:::{tab-item} Radeon™
:sync: radeon
| Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon™ RX 7600 | RDNA3 | gfx1102 | ✅ | ✅ |
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6900 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6750 XT | RDNA2 | gfx1031 | ✅ | ❌ |
| AMD Radeon™ RX 6700 XT | RDNA2 | gfx1031 | ✅ | ❌ |
| AMD Radeon™ RX 6700 | RDNA2 | gfx1031 | ✅ | ❌ |
| AMD Radeon™ RX 6650 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | ✅ | ❌ |
:::
::::
### Component support
ROCm components are described in [What is ROCm?](../../what-is-rocm.md) Support
on Windows is provided with two levels on enablement.
* **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only.
* **HIP SDK**: Runtime plus additional components are listed in [Libraries](../../reference/library-index.md).
Note that some math libraries are Linux exclusive.
### Support status
✅: **Supported** - AMD enables these GPUs in our software distributions for
the corresponding ROCm product.
⚠️: **Deprecated** - Support will be removed in a future release.
❌: **Unsupported** - This configuration is not enabled in our software
distributions.
## CPU support
ROCm requires CPUs that support PCIe™ atomics. Modern CPUs after the release of
1st generation AMD Zen CPU and Intel™ Haswell support PCIe atomics.

View File

@@ -1,9 +1,142 @@
# License
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm licensing terms">
<meta name="keywords" content="license, licensing terms">
</head>
> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that primarily contains documentation. For other licensing information, refer to the [Licensing Terms page](./licensing).
# ROCm license
```{include} ../../LICENSE
```
```{include} ./licensing.md
:::{note}
The preceding license applies to the [ROCm repository](https://github.com/ROCm/ROCm), which
primarily contains documentation. For licenses related to other ROCm components, refer to the
following section.
:::
## ROCm component licenses
ROCm is released by Advanced Micro Devices, Inc. and is licensed per component separately.
The following table is a list of ROCm components with links to their respective license
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
<!-- spellcheck-disable -->
| Component | License |
|:---------------------|:-------------------------|
| [AMDMIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm/HIPIFY/) | [MIT](https://github.com/ROCm/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm/HIP/) | [MIT](https://github.com/ROCm/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCm/MIOpenGEMM/) | [MIT](https://github.com/ROCm/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/ROCm/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm/ROCclr/) | [MIT](https://github.com/ROCm/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/ROCm/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/ROCm/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/ROCm/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| [Tensile](https://github.com/ROCm/Tensile/) | [MIT](https://github.com/ROCm/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm/aomp/) | [Apache 2.0](https://github.com/ROCm/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/ROCm/atmi/) | [MIT](https://github.com/ROCm/atmi/blob/master/LICENSE.txt) |
| [clang-ocl](https://github.com/ROCm/clang-ocl/) | [MIT](https://github.com/ROCm/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm/flang/) | [Apache 2.0](https://github.com/ROCm/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/master/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCm/hipBLAS/) | [MIT](https://github.com/ROCm/hipBLAS/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCm/hipCUB/) | [Custom](https://github.com/ROCm/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCm/hipFFT/) | [MIT](https://github.com/ROCm/hipFFT/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCm/hipSOLVER/) | [MIT](https://github.com/ROCm/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCm/hipSPARSELt/) | [MIT](https://github.com/ROCm/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCm/hipSPARSE/) | [MIT](https://github.com/ROCm/hipSPARSE/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCm/hipTensor) | [MIT](https://github.com/ROCm/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm/hipamd/) | [MIT](https://github.com/ROCm/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCm/hipfort/) | [MIT](https://github.com/ROCm/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/master/LICENSE) |
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCm/rocBLAS/) | [MIT](https://github.com/ROCm/rocBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCm/rocFFT/) | [MIT](https://github.com/ROCm/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCm/rocPRIM/) | [MIT](https://github.com/ROCm/rocPRIM/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCm/rocRAND/) | [MIT](https://github.com/ROCm/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCm/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCm/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCm/rocSPARSE/) | [MIT](https://github.com/ROCm/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCm/rocThrust/) | [Apache 2.0](https://github.com/ROCm/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCm/rocWMMA/) | [MIT](https://github.com/ROCm/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/ROCm/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.
Proprietary products are only available on https://repo.radeon.com. Currently, only
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
Proprietary components are organized in a proprietary subdirectory in the package
repositories to distinguish from open sourced packages.
```{note}
The following additional terms and conditions apply to your use of ROCm technical documentation.
```
©2023 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.
### Package licensing
:::{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary and is
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
AOCC CPU Optimizations.
:::
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -1,133 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm licensing terms">
<meta name="keywords" content="license, licensing terms">
</head>
# ROCm licensing terms
ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately.
The following table is a list of ROCm components with links to their respective license
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license, and link to the license terms.
The table is ordered to follow the ROCm manifest file.
<!-- spellcheck-disable -->
| Component | License |
|:---------------------|:-------------------------|
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCmSoftwarePlatform/hipTensor) | [MIT](https://github.com/ROCmSoftwarePlatform/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.
Proprietary products are only available on https://repo.radeon.com. Currently, only
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
Proprietary components are organized in a proprietary subdirectory in the package
repositories to distinguish from open sourced packages.
The additional terms and conditions below apply to your use of ROCm technical
documentation.
©2023 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.
## Package licensing
```{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary and is
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
AOCC CPU Optimizations.
```
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -1,6 +1,6 @@
.. meta::
:description: How ROCm uses PCIe atomics
:keywords: PCIe, PCIe atomics, atomics, BAR memory
:keywords: PCIe, PCIe atomics, atomics, BAR memory, AMD, ROCm
*****************************************************************************
How ROCm uses PCIe atomics
@@ -63,15 +63,14 @@ There are also a number of papers which talk about these new capabilities:
* `Atomic Read Modify Write Primitives by Intel <https://www.intel.es/content/dam/doc/white-paper/atomic-read-modify-write-primitives-i-o-devices-paper.pdf>`_
* `PCI express 3 Accelerator White paper by Intel <https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf>`_
* `Intel PCIe Generation 3 Hotchips Paper <https://www.hotchips.org/wp-content/uploads/hc_archives/hc21/1_sun/HC21.23.1.SystemInterconnectTutorial-Epub/HC21.23.131.Ajanovic-Intel-PCIeGen3.pdf>`_
* `PCIe Generation 4 Base Specification includes atomic operations <https://astralvx.com/storage/2020/11/PCI_Express_Base_4.0_Rev0.3_February19-2014.pdf>`_
Other I/O devices with PCIe atomics support
* `Mellanox ConnectX-5 InfiniBand Card <http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card.pdf>`_
* `Cray Aries Interconnect <http://www.hoti.org/hoti20/slides/Bob_Alverson.pdf>`_
* `Xilinx PCIe Ultrascale White paper <https://docs.xilinx.com/v/u/8OZSA2V1b1LLU2rRCDVGQw>`_
* `Xilinx 7 Series Devices <https://docs.xilinx.com/v/u/1nfXeFNnGpA0ywyykvWHWQ>`_
Other I/O devices with PCIe atomics support:
* Mellanox ConnectX-5 InfiniBand Card
* Cray Aries Interconnect
* Xilinx 7 Series Devices
Future bus technology with richer I/O atomics operation Support
@@ -80,8 +79,8 @@ Future bus technology with richer I/O atomics operation Support
New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPUs
with PCIe Generation 3.0 support.
* `Mellanox Bluefield SOC <https://docs.nvidia.com/networking/display/BlueFieldSWv25111213/BlueField+Software+Overview>`_
* `Cavium Thunder X2 <https://en.wikichip.org/wiki/cavium/thunderx2>`_
* Mellanox Bluefield SOC
* Cavium Thunder X2
In ROCm, we also take advantage of PCIe ID based ordering technology for P2P when the GPU
originates two writes to two different targets:

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="Inference optimization with MIGraphX">
<meta name="keywords" content="Inference optimization, MIGraphX, deep-learning, MIGraphX
installation">
installation, AMD, ROCm">
</head>
# Inference optimization with MIGraphX
@@ -55,15 +55,15 @@ The header files and libraries are installed under `/opt/rocm-\<version\>`, wher
There are two ways to build the MIGraphX sources.
* [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses `[rbuild](https://github.com/RadeonOpenCompute/rbuild)` to install the prerequisites and build the libraries with just one command.
* [Use the ROCm build tool](https://github.com/ROCm/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses `[rbuild](https://github.com/ROCm/rbuild)` to install the prerequisites and build the libraries with just one command.
or
* [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
* [Use CMake](https://github.com/ROCm/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
[https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source)
[https://github.com/ROCm/AMDMIGraphX#building-from-source](https://github.com/ROCm/AMDMIGraphX#building-from-source)
### Option 3: use docker
@@ -72,7 +72,7 @@ To use Docker, follow these steps:
1. The easiest way to set up the development environment is to use Docker. To build Docker from scratch, first clone the MIGraphX repository by running:
```bash
git clone --recursive https://github.com/ROCmSoftwarePlatform/AMDMIGraphX
git clone --recursive https://github.com/ROCm/AMDMIGraphX
```
2. The repository contains a Dockerfile from which you can build a Docker image as:
@@ -216,23 +216,23 @@ Follow these steps:
./inception_inference
```
```{note}
:::{note}
Set `LD_LIBRARY_PATH` to `/opt/rocm/lib` if required during the build. Additional examples can be found in the MIGraphX repository under the `/examples/` directory.
```
:::
## Tuning MIGraphX
MIGraphX uses MIOpen kernels to target AMD GPU. For the model compiled with MIGraphX, tune MIOpen to pick the best possible kernel implementation. The MIOpen tuning results in a significant performance boost. Tuning can be done by setting the environment variable `MIOPEN_FIND_ENFORCE=3`.
```{note}
:::{note}
The tuning process can take a long time to finish.
```
:::
**Example:** The average inference time of the inception model example shown previously over 100 iterations using untuned kernels is 0.01383ms. After tuning, it reduces to 0.00459ms, which is a 3x improvement. This result is from ROCm v4.5 on a MI100 GPU.
```{note}
:::{note}
The results may vary depending on the system configurations.
```
:::
For reference, the following code snippet shows inference runs for only the first 10 iterations for both tuned and untuned kernels:

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="Inception V3 with PyTorch">
<meta name="keywords" content="PyTorch, Inception V3, deep-learning, training data, optimization
algorithm">
algorithm, AMD, ROCm">
</head>
# Deep learning: Inception V3 with PyTorch
@@ -22,6 +22,7 @@ Training occurs in multiple phases for every batch of training data. the followi
:::{table} Types of Training Phases
:name: training-phases
:widths: auto
| Types of Phases | |
| ----------------- | --- |
| Forward Pass | The input features are fed into the model, whose parameters may be randomly initialized initially. Activations (outputs) of each layer are retained during this pass to help in the loss gradient computation during the backward pass. |
@@ -35,6 +36,7 @@ Training is different from inference, particularly from the hardware perspective
:::{table} Training vs. Inference
:name: training-inference
:widths: auto
| Training | Inference |
| ----------- | ----------- |
| Training is measured in hours/days. | The inference is measured in minutes. |
@@ -63,7 +65,7 @@ This example is adapted from the PyTorch research hub page on [Inception V3](htt
Follow these steps:
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
@@ -153,7 +155,7 @@ The previous section focused on downloading and using the Inception V3 model for
Follow these steps:
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker pull rocm/pytorch:latest
@@ -215,9 +217,9 @@ Follow these steps:
7. Set parameters to guide the training process.
```{note}
:::{note}
The device is set to `"cuda"`. In PyTorch, `"cuda"` is a generic keyword to denote a GPU.
```
:::
```py
device = "cuda"
@@ -277,9 +279,9 @@ Follow these steps:
lr_gamma = 0.1
```
```{note}
:::{note}
One training epoch is when the neural network passes an entire dataset forward and backward.
```
:::
```py
epochs = 90
@@ -340,9 +342,9 @@ Follow these steps:
)
```
```{note}
:::{note}
Use torchvision to obtain the Inception V3 model. Use the pre-trained model weights to speed up training.
```
:::
```py
print("Creating model")
@@ -679,7 +681,7 @@ The dataset has 60,000 images you will use to train the network and 10,000 to ev
Access the source code from the following repository:
[https://github.com/ROCmSoftwarePlatform/tensorflow_fashionmnist/blob/main/fashion_mnist.py](https://github.com/ROCmSoftwarePlatform/tensorflow_fashionmnist/blob/main/fashion_mnist.py)
[https://github.com/ROCm/tensorflow_fashionmnist/blob/main/fashion_mnist.py](https://github.com/ROCm/tensorflow_fashionmnist/blob/main/fashion_mnist.py)
To understand the code step by step, follow these steps:
@@ -876,7 +878,7 @@ To understand the code step by step, follow these steps:
thisplot[true_label].set_color('blue')
```
9. With the model trained, you can use it to make predictions about some images. Review the 0-th image predictions and the prediction array. Correct prediction labels are blue, and incorrect prediction labels are red. The number gives the percentage (out of 100) for the predicted label.
9. With the model trained, you can use it to make predictions about some images. Review the 0<sup>th</sup> image predictions and the prediction array. Correct prediction labels are blue, and incorrect prediction labels are red. The number gives the percentage (out of 100) for the predicted label.
```py
i = 0
@@ -1162,9 +1164,10 @@ To prepare the data for training, follow these steps:
print("Accuracy: ", accuracy)
```
```{note}
model.fit() returns a History object that contains a dictionary with everything that happened during training.
```
:::{note}
`model.fit()` returns a History object that contains a dictionary with everything that happened during
training.
:::
```py
history_dict = history.history

View File

@@ -1,6 +1,6 @@
.. meta::
:description: Using CMake
:keywords: CMake, dependencies, HIP, C++
:keywords: CMake, dependencies, HIP, C++, AMD, ROCm
*********************************
Using CMake
@@ -25,13 +25,13 @@ Finding dependencies
In short, CMake supports finding dependencies in two ways:
* In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to
find the component in typical install locations and layouts. CMake ships a
few dozen such scripts, but users and projects may ship them as well.
* In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to find the component
in typical install locations and layouts. CMake ships a few dozen such scripts, but users and projects
may ship them as well.
* In Config mode, it locates a file named ``<packagename>-config.cmake`` or
``<PackageName>Config.cmake`` which describes the installed component in all
regards needed to consume it.
* In Config mode, it locates a file named ``<packagename>-config.cmake`` or
``<PackageName>Config.cmake`` which describes the installed component in all regards needed to
consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
@@ -56,8 +56,10 @@ the *config-file* packages are shipped with the upstream projects, such as
rocPRIM and other ROCm libraries.
For a complete guide on where and how ROCm may be installed on a system, refer
to the installation guides for {doc}`Linux<rocm-install-on-linux:tutorial/quick-start>` and
{doc}`Windows<rocm-install-on-windows:tutorial/install-quick>`.
to the installation guides for
`Linux <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html>`_
and
`Windows <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/index.html>`_.
Using HIP in CMake
==================
@@ -196,14 +198,13 @@ all the flags necessary for device compilation.
Compiling for the GPU device requires at least C++11.
This project can then be configured with the following CMake commands.
This project can then be configured with the following CMake commands:
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
- Linux: ``cmake -D CMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++``
* Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
* Linux: ``cmake -D CMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++``
Which use the device compiler provided from the binary packages of
`ROCm HIP SDK <https://www.amd.com/en/developer/rocm-hub.html>`_ and
`ROCm HIP SDK <https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html>`_ and
`repo.radeon.com <https://repo.radeon.com>`_ respectively.
When using the ``CXX`` language support to compile HIP device code, selecting the

View File

@@ -1,7 +1,7 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm compilers disambiguation">
<meta name="keywords" content="compilers, compiler naming">
<meta name="keywords" content="compilers, compiler naming, AMD, ROCm">
</head>
# ROCm compilers disambiguation
@@ -13,9 +13,9 @@ disambiguates compiler naming used throughout the documentation.
| Term | Description |
| - | - |
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/ROCm/llvm-project" target="_blank">https://github.com/ROCm/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm/HIPIFY" target="_blank">https://github.com/ROCm/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm/HIPCC" target="_blank">https://github.com/ROCm/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -1,7 +1,8 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm Linux Filesystem Hierarchy Standard reorganization">
<meta name="keywords" content="FHS, Linux Filesystem Hierarchy Standard, directory structure">
<meta name="keywords" content="FHS, Linux Filesystem Hierarchy Standard, directory structure,
AMD, ROCm">
</head>
# ROCm Linux Filesystem Hierarchy Standard reorganization

View File

@@ -5,29 +5,43 @@
MI100, AMD Instinct">
</head>
(gpu-arch-documentation)=
# GPU architecture documentation
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card}
**AMD Instinct MI300 series**
Review hardware aspects of the AMD Instinct™ MI300 series of GPU accelerators and the CDNA™ 3
architecture.
* [AMD Instinct™ MI300 microarchitecture](./gpu-arch/mi300.md)
* [AMD Instinct MI300/CDNA3 ISA](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf)
* [Performance counters](./gpu-arch/mi300-mi200-performance-counters.rst)
:::
:::{grid-item-card}
**AMD Instinct MI200 series**
Review hardware aspects of the AMD Instinct™ MI200 series of GPU
accelerators and the CDNA™ 2 architecture.
Review hardware aspects of the AMD Instinct™ MI200 series of GPU accelerators and the CDNA™ 2
architecture.
* [AMD Instinct™ MI250 microarchitecture](./gpu-arch/mi250.md)
* [AMD Instinct MI200/CDNA2 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
* [Performance counters](./gpu-arch/mi200-performance-counters.md)
* [Performance counters](./gpu-arch/mi300-mi200-performance-counters.rst)
:::
:::{grid-item-card}
**AMD Instinct MI100**
Review hardware aspects of the AMD Instinct™ MI100
accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs.
Review hardware aspects of the AMD Instinct™ MI100 series of GPU accelerators and the CDNA™ 1
architecture.
* [AMD Instinct™ MI100 microarchitecture](./gpu-arch/mi100.md)
* [AMD Instinct MI100/CDNA1 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)

View File

@@ -1,7 +1,7 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD Instinct MI100 microarchitecture">
<meta name="keywords" content="Instinct, MI100, microarchitecture">
<meta name="keywords" content="Instinct, MI100, microarchitecture, AMD, ROCm">
</head>
# AMD Instinct™ MI100 microarchitecture

View File

@@ -1,578 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="MI200 performance counters and metrics">
<meta name="keywords" content="MI200, performance counters, counters, GRBM counters, GRBM,
CPF counters, CPF, CPC counters, CPC, command processor counters, SPI counters, SPI">
</head>
# MI200 performance counters and metrics
<!-- markdownlint-disable no-duplicate-header -->
This document lists and describes the hardware performance counters and derived metrics available on the AMD Instinct™ MI200 GPU. All the hardware basic counters and derived metrics are accessible via {doc}`ROCProfiler tool <rocprofiler:rocprofv1>`.
## MI200 performance counters list
See the category-wise listing of MI200 performance counters in the following tables.
```{note}
Preliminary validation of all MI200 performance counters is in progress. Those with “*” appended to the names require further evaluation.
```
### Graphics Register Bus Management (GRBM) counters
| Hardware Counter | Unit | Definition |
|:--------------------|:--------|:--------------------------------------------------------------------------|
| `GRBM_COUNT` | Cycles | Number of free-running GPU cycles |
| `GRBM_GUI_ACTIVE` | Cycles | Number of GPU active cycles |
| `GRBM_CP_BUSY` | Cycles | Number of cycles any of the Command Processor (CP) blocks are busy |
| `GRBM_SPI_BUSY` | Cycles | Number of cycles any of the Shader Processor Input (SPI) are busy in the shader engine(s) |
| `GRBM_TA_BUSY` | Cycles | Number of cycles any of the Texture Addressing Unit (TA) are busy in the shader engine(s) |
| `GRBM_TC_BUSY` | Cycles | Number of cycles any of the Texture Cache Blocks (TCP/TCI/TCA/TCC) are busy |
| `GRBM_CPC_BUSY` | Cycles | Number of cycles the Command Processor - Compute (CPC) is busy |
| `GRBM_CPF_BUSY` | Cycles | Number of cycles the Command Processor - Fetcher (CPF) is busy |
| `GRBM_UTCL2_BUSY` | Cycles | Number of cycles the Unified Translation Cache - Level 2 (UTCL2) block is busy |
| `GRBM_EA_BUSY` | Cycles | Number of cycles the Efficiency Arbiter (EA) block is busy |
### Command Processor (CP) counters
The CP counters are further classified into CP-Fetcher (CPF) and CP-Compute (CPC).
#### CPF counters
| Hardware Counter | Unit | Definition |
|:--------------------------------------|:--------|:-------------------------------------------------------------|
| `CPF_CMP_UTCL1_STALL_ON_TRANSLATION` | Cycles | Number of cycles one of the Compute UTCL1s is stalled waiting on translation |
| `CPF_CPF_STAT_BUSY` | Cycles | Number of cycles CPF is busy |
| `CPF_CPF_STAT_IDLE*` | Cycles | Number of cycles CPF is idle |
| `CPF_CPF_STAT_STALL` | Cycles | Number of cycles CPF is stalled |
| `CPF_CPF_TCIU_BUSY` | Cycles | Number of cycles CPF Texture Cache Interface Unit (TCIU) interface is busy |
| `CPF_CPF_TCIU_IDLE` | Cycles | Number of cycles CPF TCIU interface is idle |
| `CPF_CPF_TCIU_STALL*` | Cycles | Number of cycles CPF TCIU interface is stalled waiting on free tags |
#### CPC counters
| Hardware Counter | Unit | Definition |
|:---------------------------------|:-------|:---------------------------------------------------|
| `CPC_ME1_BUSY_FOR_PACKET_DECODE` | Cycles | Number of cycles CPC Micro Engine (ME1) is busy decoding packets |
| `CPC_UTCL1_STALL_ON_TRANSLATION` | Cycles | Number of cycles one of the UTCL1s is stalled waiting on translation |
| `CPC_CPC_STAT_BUSY` | Cycles | Number of cycles CPC is busy |
| `CPC_CPC_STAT_IDLE` | Cycles | Number of cycles CPC is idle |
| `CPC_CPC_STAT_STALL` | Cycles | Number of cycles CPC is stalled |
| `CPC_CPC_TCIU_BUSY` | Cycles | Number of cycles CPC TCIU interface is busy |
| `CPC_CPC_TCIU_IDLE` | Cycles | Number of cycles CPC TCIU interface is idle |
| `CPC_CPC_UTCL2IU_BUSY` | Cycles | Number of cycles CPC UTCL2 interface is busy |
| `CPC_CPC_UTCL2IU_IDLE` | Cycles | Number of cycles CPC UTCL2 interface is idle |
| `CPC_CPC_UTCL2IU_STALL` | Cycles | Number of cycles CPC UTCL2 interface is stalled |
| `CPC_ME1_DC0_SPI_BUSY` | Cycles | Number of cycles CPC ME1 Processor is busy |
### Shader Processor Input (SPI) counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:-----------|:-----------------------------------------------------------|
| `SPI_CSN_BUSY` | Cycles | Number of cycles with outstanding waves |
| `SPI_CSN_WINDOW_VALID` | Cycles | Number of cycles enabled by `perfcounter_start` event |
| `SPI_CSN_NUM_THREADGROUPS` | Workgroups | Number of dispatched workgroups |
| `SPI_CSN_WAVE` | Wavefronts | Number of dispatched wavefronts |
| `SPI_RA_REQ_NO_ALLOC` | Cycles | Number of Arb cycles with requests but no allocation |
|`SPI_RA_REQ_NO_ALLOC_CSN` | Cycles | Number of Arb cycles with Compute Shader, n-th pipe (CSn) requests but no CSn allocation |
| `SPI_RA_RES_STALL_CSN` | Cycles | Number of Arb stall cycles due to shortage of CSn pipeline slots |
| `SPI_RA_TMP_STALL_CSN*` | Cycles | Number of stall cycles due to shortage of temp space |
| `SPI_RA_WAVE_SIMD_FULL_CSN` | SIMD-cycles | Accumulated number of Single Instruction Multiple Data (SIMDs) per cycle affected by shortage of wave slots for CSn wave dispatch |
| `SPI_RA_VGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of VGPR slots for CSn wave dispatch |
| `SPI_RA_SGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of SGPR slots for CSn wave dispatch |
| `SPI_RA_LDS_CU_FULL_CSN` | CUs | Number of Compute Units (CUs) affected by shortage of LDS space for CSn wave dispatch |
| `SPI_RA_BAR_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting at a BARRIER |
| `SPI_RA_BULKY_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting for BULKY resource |
| `SPI_RA_TGLIM_CU_FULL_CSN*` | Cycles | Number of CSn wave stall cycles due to restriction of `tg_limit` for thread group size |
| `SPI_RA_WVLIM_STALL_CSN*` | Cycles | Number of cycles CSn is stalled due to WAVE_LIMIT |
| `SPI_VWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (VGPRs) when launching waves |
| `SPI_SWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (SGPRs) when launching waves |
### Compute Unit (CU) counters
The CU counters are further classified into instruction mix, Matrix Fused Multiply Add (MFMA) operation counters, level counters, wavefront counters, wavefront cycle counters and Local Data Share (LDS) counters.
#### Instruction mix
| Hardware Counter | Unit | Definition |
|:-----------------------|:-----|:-----------------------------------------------------------------------|
| `SQ_INSTS` | Instr | Number of instructions issued. |
| `SQ_INSTS_VALU` | Instr | Number of Vector Arithmetic Logic Unit (VALU) instructions including MFMA issued. |
| `SQ_INSTS_VALU_ADD_F16` | Instr | Number of VALU Half Precision Floating Point (F16) ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F16` | Instr | Number of VALU F16 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F16` | Instr | Number of VALU F16 Fused Multiply Add (FMA)/ Multiply Add (MAD) instructions issued. |
| `SQ_INSTS_VALU_TRANS_F16` | Instr | Number of VALU F16 Transcendental instructions issued. |
| `SQ_INSTS_VALU_ADD_F32` | Instr | Number of VALU Full Precision Floating Point (F32) ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F32` | Instr | Number of VALU F32 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F32` | Instr | Number of VALU F32 FMA/MAD instructions issued. |
| `SQ_INSTS_VALU_TRANS_F32` | Instr | Number of VALU F32 Transcendental instructions issued. |
| `SQ_INSTS_VALU_ADD_F64` | Instr | Number of VALU F64 ADD/SUB instructions issued. |
| `SQ_INSTS_VALU_MUL_F64` | Instr | Number of VALU F64 Multiply instructions issued. |
| `SQ_INSTS_VALU_FMA_F64` | Instr | Number of VALU F64 FMA/MAD instructions issued. |
| `SQ_INSTS_VALU_TRANS_F64` | Instr | Number of VALU F64 Transcendental instructions issued. |
| `SQ_INSTS_VALU_INT32` | Instr | Number of VALU 32-bit integer instructions (signed or unsigned) issued. |
| `SQ_INSTS_VALU_INT64` | Instr | Number of VALU 64-bit integer instructions (signed or unsigned) issued. |
| `SQ_INSTS_VALU_CVT` | Instr | Number of VALU Conversion instructions issued. |
| `SQ_INSTS_VALU_MFMA_I8` | Instr | Number of 8-bit Integer MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F16` | Instr | Number of F16 MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_BF16` | Instr | Number of Brain Floating Point - 16 (BF16) MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F32` | Instr | Number of F32 MFMA instructions issued. |
| `SQ_INSTS_VALU_MFMA_F64` | Instr | Number of F64 MFMA instructions issued. |
| `SQ_INSTS_MFMA` | Instr | Number of MFMA instructions issued. |
| `SQ_INSTS_VMEM_WR` | Instr | Number of Vector Memory (VMEM) Write instructions (including FLAT) issued. |
| `SQ_INSTS_VMEM_RD` | Instr | Number of VMEM Read instructions (including FLAT) issued. |
| `SQ_INSTS_VMEM` | Instr | Number of VMEM instructions issued, including both FLAT and Buffer instructions. |
| `SQ_INSTS_SALU` | Instr | Number of SALU instructions issued. |
| `SQ_INSTS_SMEM` | Instr | Number of Scalar Memory (SMEM) instructions issued. |
| `SQ_INSTS_SMEM_NORM` | Instr | Number of SMEM instructions normalized to match `smem_level` issued. |
| `SQ_INSTS_FLAT` | Instr | Number of FLAT instructions issued. |
| `SQ_INSTS_FLAT_LDS_ONLY` | Instr | Number of FLAT instructions that read/write only from/to LDS issued. Works only if `EARLY_TA_DONE` is enabled. |
| `SQ_INSTS_LDS` | Instr | Number of Local Data Share (LDS) instructions issued (including FLAT). |
| `SQ_INSTS_GDS` | Instr | Number of Global Data Share (GDS) instructions issued. |
| `SQ_INSTS_EXP_GDS` | Instr | Number of EXP and GDS instructions excluding skipped export instructions issued. |
| `SQ_INSTS_BRANCH` | Instr | Number of Branch instructions issued. |
| `SQ_INSTS_SENDMSG` | Instr | Number of `SENDMSG` instructions including `s_endpgm` issued. |
| `SQ_INSTS_VSKIPPED*` | Instr | Number of vector instructions skipped. |
#### MFMA operation counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:-----|:----------------------------------------------|
| `SQ_INSTS_VALU_MFMA_MOPS_I8` | IOP | Number of 8-bit integer MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F16` | FLOP | Number of F16 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_BF16` | FLOP | Number of BF16 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F32` | FLOP | Number of F32 floating MFMA ops in the unit of 512 |
| `SQ_INSTS_VALU_MFMA_MOPS_F64` | FLOP | Number of F64 floating MFMA ops in the unit of 512 |
#### Level counters
:::{note}
All level counters must be followed by `SQ_ACCUM_PREV_HIRES` counter to measure average latency.
:::
| Hardware Counter | Unit | Definition |
|:-------------------|:-----|:-------------------------------------|
| `SQ_ACCUM_PREV` | Count | Accumulated counter sample value where accumulation takes place once every four cycles. |
| `SQ_ACCUM_PREV_HIRES` | Count | Accumulated counter sample value where accumulation takes place once every cycle. |
| `SQ_LEVEL_WAVES` | Waves | Number of inflight waves. To calculate the wave latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_WAVE`. |
| `SQ_INST_LEVEL_VMEM` | Instr | Number of inflight VMEM (including FLAT) instructions. To calculate the VMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_VMEM`. |
| `SQ_INST_LEVEL_SMEM` | Instr | Number of inflight SMEM instructions. To calculate the SMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_SMEM_NORM`. |
| `SQ_INST_LEVEL_LDS` | Instr | Number of inflight LDS (including FLAT) instructions. To calculate the LDS latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_LDS`. |
| `SQ_IFETCH_LEVEL` | Instr | Number of inflight instruction fetch requests from the cache. To calculate the instruction fetch latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_IFETCH`. |
#### Wavefront counters
| Hardware Counter | Unit | Definition |
|:--------------------|:-----|:----------------------------------------------------------------|
| `SQ_WAVES` | Waves | Number of wavefronts dispatched to Sequencers (SQs), including both new and restored wavefronts |
| `SQ_WAVES_SAVED*` | Waves | Number of context-saved waves |
| `SQ_WAVES_RESTORED*` | Waves | Number of context-restored waves sent to SQs |
| `SQ_WAVES_EQ_64` | Waves | Number of wavefronts with exactly 64 active threads sent to SQs |
| `SQ_WAVES_LT_64` | Waves | Number of wavefronts with less than 64 active threads sent to SQs |
| `SQ_WAVES_LT_48` | Waves | Number of wavefronts with less than 48 active threads sent to SQs |
| `SQ_WAVES_LT_32` | Waves | Number of wavefronts with less than 32 active threads sent to SQs |
| `SQ_WAVES_LT_16` | Waves | Number of wavefronts with less than 16 active threads sent to SQs |
#### Wavefront cycle counters
| Hardware Counter | Unit | Definition |
|:------------------------|:-------|:--------------------------------------------------------------------|
| `SQ_CYCLES` | Cycles | Clock cycles. |
| `SQ_BUSY_CYCLES` | Cycles | Number of cycles while SQ reports it to be busy. |
| `SQ_BUSY_CU_CYCLES` | Qcycles | Number of quad-cycles each CU is busy. |
| `SQ_VALU_MFMA_BUSY_CYCLES` | Cycles | Number of cycles the MFMA ALU is busy. |
| `SQ_WAVE_CYCLES` | Qcycles | Number of quad-cycles spent by waves in the CUs. |
| `SQ_WAIT_ANY` | Qcycles | Number of quad-cycles spent waiting for anything. |
| `SQ_WAIT_INST_ANY` | Qcycles | Number of quad-cycles spent waiting for any instruction to be issued. |
| `SQ_ACTIVE_INST_ANY` | Qcycles | Number of quad-cycles spent by each wave to work on an instruction. |
| `SQ_ACTIVE_INST_VMEM` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VMEM instruction. |
| `SQ_ACTIVE_INST_LDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an LDS instruction. |
| `SQ_ACTIVE_INST_VALU` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VALU instruction. |
| `SQ_ACTIVE_INST_SCA` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a SALU or SMEM instruction. |
| `SQ_ACTIVE_INST_EXP_GDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an EXPORT or GDS instruction. |
| `SQ_ACTIVE_INST_MISC` | Qcycles | Number of quad-cycles spent by the SQ instruction aribter to work on a BRANCH or `SENDMSG` instruction. |
| `SQ_ACTIVE_INST_FLAT` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a FLAT instruction. |
| `SQ_INST_CYCLES_VMEM_WR` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Write instructions. |
| `SQ_INST_CYCLES_VMEM_RD` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Read instructions. |
| `SQ_INST_CYCLES_SMEM` | Qcycles | Number of quad-cycles spent to execute scalar memory reads. |
| `SQ_INST_CYCLES_SALU` | Qcycles | Number of quad-cycles spent to execute non-memory read scalar operations. |
| `SQ_THREAD_CYCLES_VALU` | Cycles | Number of thread-cycles spent to execute VALU operations. This is similar to `INST_CYCLES_VALU` but multiplied by the number of active threads. |
| `SQ_WAIT_INST_LDS` | Qcycles | Number of quad-cycles spent waiting for LDS instruction to be issued. |
#### LDS counters
| Hardware Counter | Unit | Definition |
|:--------------------------|:------|:--------------------------------------------------------|
| `SQ_LDS_ATOMIC_RETURN` | Cycles | Number of atomic return cycles in LDS |
| `SQ_LDS_BANK_CONFLICT` | Cycles | Number of cycles LDS is stalled by bank conflicts |
| `SQ_LDS_ADDR_CONFLICT*` | Cycles | Number of cycles LDS is stalled by address conflicts |
| `SQ_LDS_UNALIGNED_STALL*` | Cycles | Number of cycles LDS is stalled processing flat unaligned load/store ops |
| `SQ_LDS_MEM_VIOLATIONS*` | Count | Number of threads that have a memory violation in the LDS |
| `SQ_LDS_IDX_ACTIVE` | Cycles | Number of cycles LDS is used for indexed operations |
#### Miscellaneous counters
| Hardware Counter | Unit | Definition |
|:--------------------------|:------|:--------------------------------------------------------|
| `SQ_IFETCH` | Count | Number of instruction fetch requests from `L1I` cache, in 32-byte width |
| `SQ_ITEMS` | Threads | Number of valid items per wave |
### L1I and sL1D cache counters
| Hardware Counter | Unit | Definition |
|:----------------------------|:------|:----------------------------------------------------------------|
| `SQC_ICACHE_REQ` | Req | Number of `L1I` cache requests |
| `SQC_ICACHE_HITS` | Count | Number of `L1I` cache hits |
| `SQC_ICACHE_MISSES` | Count | Number of non-duplicate `L1I` cache misses including uncached requests |
| `SQC_ICACHE_MISSES_DUPLICATE` | Count | Number of duplicate `L1I` cache misses whose previous lookup miss on the same cache line is not fulfilled yet |
| `SQC_DCACHE_REQ` | Req | Number of `sL1D` cache requests |
| `SQC_DCACHE_INPUT_VALID_READYB` | Cycles | Number of cycles while SQ input is valid but sL1D cache is not ready |
| `SQC_DCACHE_HITS` | Count | Number of `sL1D` cache hits |
| `SQC_DCACHE_MISSES` | Count | Number of non-duplicate `sL1D` cache misses including uncached requests |
| `SQC_DCACHE_MISSES_DUPLICATE` | Count | Number of duplicate `sL1D` cache misses |
| `SQC_DCACHE_REQ_READ_1` | Req | Number of constant cache read requests in a single DW |
| `SQC_DCACHE_REQ_READ_2` | Req | Number of constant cache read requests in two DW |
| `SQC_DCACHE_REQ_READ_4` | Req | Number of constant cache read requests in four DW |
| `SQC_DCACHE_REQ_READ_8` | Req | Number of constant cache read requests in eight DW |
| `SQC_DCACHE_REQ_READ_16` | Req | Number of constant cache read requests in 16 DW |
| `SQC_DCACHE_ATOMIC*` | Req | Number of atomic requests |
| `SQC_TC_REQ` | Req | Number of TC requests that were issued by instruction and constant caches |
| `SQC_TC_INST_REQ` | Req | Number of instruction requests to the L2 cache |
| `SQC_TC_DATA_READ_REQ` | Req | Number of data Read requests to the L2 cache |
| `SQC_TC_DATA_WRITE_REQ*` | Req | Number of data write requests to the L2 cache |
| `SQC_TC_DATA_ATOMIC_REQ*` | Req | Number of data atomic requests to the L2 cache |
| `SQC_TC_STALL*` | Cycles | Number of cycles while the valid requests to the L2 cache are stalled |
### Vector L1 cache subsystem
The vector L1 cache subsystem counters are further classified into Texture Addressing Unit (TA), Texture Data Unit (TD), vector L1D cache or Texture Cache per Pipe (TCP), and Texture Cache Arbiter (TCA) counters.
#### TA counters
| Hardware Counter | Unit | Definition |
|:--------------------------------|:------|:------------------------------------------------|
| `TA_TA_BUSY[n]` | Cycles | TA busy cycles. Value range for n: [0-15]. |
| `TA_TOTAL_WAVEFRONTS[n]` | Instr | Number of wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_WAVEFRONTS[n]` | Instr | Number of buffer wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_READ_WAVEFRONTS[n]` | Instr | Number of buffer read wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_WRITE_WAVEFRONTS[n]` | Instr | Number of buffer write wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_ATOMIC_WAVEFRONTS[n]` | Instr | Number of buffer atomic wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_BUFFER_TOTAL_CYCLES[n]` | Cycles | Number of buffer cycles (including read and write) issued to TC. Value range for n: [0-15]. |
| `TA_BUFFER_COALESCED_READ_CYCLES[n]` | Cycles | Number of coalesced buffer read cycles issued to TC. Value range for n: [0-15]. |
| `TA_BUFFER_COALESCED_WRITE_CYCLES[n]` | Cycles | Number of coalesced buffer write cycles issued to TC. Value range for n: [0-15]. |
| `TA_ADDR_STALLED_BY_TC_CYCLES[n]` | Cycles | Number of cycles TA address path is stalled by TC. Value range for n: [0-15]. |
| `TA_DATA_STALLED_BY_TC_CYCLES[n]` | Cycles | Number of cycles TA data path is stalled by TC. Value range for n: [0-15]. |
| `TA_ADDR_STALLED_BY_TD_CYCLES[n]` | Cycles | Number of cycles TA address path is stalled by TD. Value range for n: [0-15]. |
| `TA_FLAT_WAVEFRONTS[n]` | Instr | Number of flat opcode wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_READ_WAVEFRONTS[n]` | Instr | Number of flat opcode read wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_WRITE_WAVEFRONTS[n]` | Instr | Number of flat opcode write wavefronts processed by TA. Value range for n: [0-15]. |
| `TA_FLAT_ATOMIC_WAVEFRONTS[n]` | Instr | Number of flat opcode atomic wavefronts processed by TA. Value range for n: [0-15]. |
#### TD counters
| Hardware Counter | Unit | Definition |
|:------------------------|:-----|:---------------------------------------------------|
| `TD_TD_BUSY[n]` | Cycle | TD busy cycles while it is processing or waiting for data. Value range for n: [0-15]. |
| `TD_TC_STALL[n]` | Cycle | Number of cycles TD is stalled waiting for TC data. Value range for n: [0-15]. |
| `TD_SPI_STALL[n]` | Cycle | Number of cycles TD is stalled by SPI. Value range for n: [0-15]. |
| `TD_LOAD_WAVEFRONT[n]` | Instr |Number of wavefront instructions (read/write/atomic). Value range for n: [0-15]. |
| `TD_STORE_WAVEFRONT[n]` | Instr | Number of write wavefront instructions. Value range for n: [0-15].|
| `TD_ATOMIC_WAVEFRONT[n]` | Instr | Number of atomic wavefront instructions. Value range for n: [0-15]. |
| `TD_COALESCABLE_WAVEFRONT[n]` | Instr | Number of coalescable wavefronts according to TA. Value range for n: [0-15]. |
#### TCP counters
| Hardware Counter | Unit | Definition |
|:-----------------------------------|:------|:----------------------------------------------------------|
| `TCP_GATE_EN1[n]` | Cycles | Number of cycles vL1D interface clocks are turned on. Value range for n: [0-15]. |
| `TCP_GATE_EN2[n]` | Cycles | Number of cycles vL1D core clocks are turned on. Value range for n: [0-15]. |
| `TCP_TD_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TD stalls vL1D. Value range for n: [0-15]. |
| `TCP_TCR_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TCR stalls vL1D. Value range for n: [0-15]. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a read. Value range for n: [0-15]. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a write. Value range for n: [0-15]. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on an atomic. Value range for n: [0-15]. |
| `TCP_PENDING_STALL_CYCLES[n]` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache. Value range for n: [0-15]. |
| `TCP_TCP_TA_DATA_STALL_CYCLES` | Cycles | Number of cycles TCP stalls TA data interface. |
| `TCP_TA_TCP_STATE_READ[n]` | Req | Number of state reads. Value range for n: [0-15]. |
| `TCP_VOLATILE[n]` | Req | Number of L1 volatile pixels/buffers from TA. Value range for n: [0-15]. |
| `TCP_TOTAL_ACCESSES[n]` | Req | Number of vL1D accesses. Equals `TCP_PERF_SEL_TOTAL_READ`+`TCP_PERF_SEL_TOTAL_NONREAD`. Value range for n: [0-15]. |
| `TCP_TOTAL_READ[n]` | Req | Number of vL1D read accesses. Equals `TCP_PERF_SEL_TOTAL_HIT_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_EVICT_READ`. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITE[n]` | Req | Number of vL1D write accesses. `Equals TCP_PERF_SEL_TOTAL_MISS_LRU_WRITE`+ `TCP_PERF_SEL_TOTAL_MISS_EVICT_WRITE`. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITH_RET[n]` | Req | Number of vL1D atomic requests with return. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET[n]` | Req | Number of vL1D atomic without return. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES[n]` | Count | Total number of vL1D writebacks and invalidates. Equals `TCP_PERF_SEL_TOTAL_WBINVL1`+ `TCP_PERF_SEL_TOTAL_WBINVL1_VOL`+ `TCP_PERF_SEL_CP_TCP_INVALIDATE`+ `TCP_PERF_SEL_SQ_TCP_INVALIDATE_VOL`. Value range for n: [0-15]. |
| `TCP_UTCL1_REQUEST[n]` | Req | Number of address translation requests to UTCL1. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_HIT[n]` | Req | Number of UTCL1 translation hits. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_MISS[n]` | Req | Number of UTCL1 translation misses. Value range for n: [0-15]. |
| `TCP_UTCL1_PERMISSION_MISS[n]` | Req | Number of UTCL1 permission misses. Value range for n: [0-15]. |
| `TCP_TOTAL_CACHE_ACCESSES[n]` | Req | Number of vL1D cache accesses including hits and misses. Value range for n: [0-15]. |
| `TCP_TCP_LATENCY[n]` | Cycles | Accumulated wave access latency to vL1D over all wavefronts. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ[n]` | Req | Number of read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ[n]` | Req | Number of write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ[n]` | Req | Number of atomic requests to L2 cache with return. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]` | Req | Number of atomic requests to L2 cache without return. Value range for n: [0-15]. |
| `TCP_TCC_NC_READ_REQ[n]` | Req | Number of NC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_READ_REQ[n]` | Req | Number of UC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_READ_REQ[n]` | Req | Number of CC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_READ_REQ[n]` | Req | Number of RW read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_WRITE_REQ[n]` | Req | Number of NC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_WRITE_REQ[n]` | Req | Number of UC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_WRITE_REQ[n]` | Req | Number of CC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_WRITE_REQ[n]` | Req | Number of RW write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_ATOMIC_REQ[n]` | Req | Number of NC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_ATOMIC_REQ[n]` | Req | Number of UC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_ATOMIC_REQ[n]` | Req | Number of CC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_ATOMIC_REQ[n]` | Req | Number of RW atomic requests to L2 cache. Value range for n: [0-15]. |
#### TCA counters
| Hardware Counter | Unit | Definition |
|:----------------|:------|:------------------------------------------|
| `TCA_CYCLE[n]` | Cycles | Number of TCA cycles. Value range for n: [0-31]. |
| `TCA_BUSY[n]` | Cycles | Number of cycles TCA has a pending request. Value range for n: [0-31]. |
### L2 cache access counters
L2 Cache is also known as Texture Cache per Channel (TCC).
| Hardware Counter | Unit | Definition |
|:--------------------------------|:------|:-------------------------------------------------------------|
| `TCC_CYCLE[n]` |Cycle | Number of L2 cache free-running clocks. Value range for n: [0-31]. |
| `TCC_BUSY[n]` |Cycle | Number of L2 cache busy cycles. Value range for n: [0-31]. |
| `TCC_REQ[n]` |Req | Number of L2 cache requests of all types. This is measured at the tag block. This may be more than the number of requests arriving at the TCC, but it is a good indication of the total amount of work that needs to be performed. Value range for n: [0-31]. |
| `TCC_STREAMING_REQ[n]` |Req | Number of L2 cache streaming requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_NC_REQ[n]` |Req | Number of NC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_UC_REQ[n]` |Req | Number of UC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_CC_REQ[n]` |Req | Number of CC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_RW_REQ[n]` |Req | Number of RW requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_PROBE[n]` |Req | Number of probe requests. Value range for n: [0-31]. |
| `TCC_PROBE_ALL[n]` |Req | Number of external probe requests with `EA_TCC_preq_all`== 1. Value range for n: [0-31]. |
| `TCC_READ[n]` |Req | Number of L2 cache read requests. This includes compressed reads but not metadata reads. Value range for n: [0-31]. |
| `TCC_WRITE[n]` |Req | Number of L2 cache write requests. Value range for n: [0-31]. |
| `TCC_ATOMIC[n]` |Req | Number of L2 cache atomic requests of all types. Value range for n: [0-31]. |
| `TCC_HIT[n]` |Req | Number of L2 cache hits. Value range for n: [0-31]. |
| `TCC_MISS[n]` |Req | Number of L2 cache misses. Value range for n: [0-31]. |
| `TCC_WRITEBACK[n]` |Req | Number of lines written back to the main memory, including writebacks of dirty lines and uncached write/atomic requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ[n]` |Req | Number of 32-byte and 64-byte transactions going over the `TC_EA_wrreq` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_64B[n]` |Req | Total number of 64-byte transactions (write or `CMPSWAP`) going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_WR_UNCACHED_32B[n]` |Req | Number of 32-byte write/atomic going over the `TC_EA_wrreq` interface due to uncached traffic. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. Value range for n: [0-31].|
| `TCC_EA_WRREQ_STALL[n]` | Cycles | Number of cycles a write request is stalled. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of IO credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of GMI credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of DRAM credits. Value range for n: [0-31]. |
| `TCC_TOO_MANY_EA_WRREQS_STALL[n]` | Cycles | Number of cycles the L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_LEVEL[n]` | Req | The accumulated number of EA write requests in flight. This is primarily intended to measure average EA write latency. Average write latency = `TCC_PERF_SEL_EA_WRREQ_LEVEL`/`TCC_PERF_SEL_EA_WRREQ`. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC[n]` | Req | Number of 32-byte or 64-byte atomic requests going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC_LEVEL[n]` | Req | The accumulated number of EA atomic requests in flight. This is primarily intended to measure average EA atomic latency. Average atomic latency = `TCC_PERF_SEL_EA_WRREQ_ATOMIC_LEVEL`/`TCC_PERF_SEL_EA_WRREQ_ATOMIC`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ[n]` | Req | Number of 32-byte or 64-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_32B[n]` | Req | Number of 32-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RD_UNCACHED_32B[n]` | Req | Number of 32-byte EA reads due to uncached traffic. A 64-byte request is counted as 2. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of IO credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of GMI credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of DRAM credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_LEVEL[n]` | Req | The accumulated number of EA read requests in flight. This is primarily intended to measure average EA read latency. Average read latency = `TCC_PERF_SEL_EA_RDREQ_LEVEL`/`TCC_PERF_SEL_EA_RDREQ`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA read requests to High Bandwidth Memory (HBM). Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA write requests to HBM. Value range for n: [0-31]. |
| `TCC_TAG_STALL[n]` | Cycles | Number of cycles the normal request pipeline in the tag is stalled for any reason. Normally, stalls of this nature are measured exactly at one point in the pipeline however in case of this counter, probes can stall the pipeline at a variety of places and there is no single point that can reasonably measure the total stalls accurately. Value range for n: [0-31]. |
| `TCC_NORMAL_WRITEBACK[n]` | Req | Number of writebacks due to requests that are not writeback requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_WB_WRITEBACK[n]` | Req | Number of writebacks due to all `TC_OP` writeback requests. Value range for n: [0-31]. |
| `TCC_NORMAL_EVICT[n]` | Req | Number of evictions due to requests that are not invalidate or probe requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_INV_EVICT[n]` | Req | Number of evictions due to all `TC_OP` invalidate requests. Value range for n: [0-31]. |
## MI200 derived metrics list
| Derived Metric | Description |
|:----------------|:-------------------------------------------------------------------------------------|
| `ALUStalledByLDS` | Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready. Reduce this by reducing the LDS bank conflicts or the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). |
| `FetchSize` | Total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `FlatLDSInsts` | Average number of FLAT instructions that read from or write to LDS, executed per work item (affected by flow control). |
| `FlatVMemInsts` | Average number of FLAT instructions that read from or write to the video memory, executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch. |
| `GDSInsts` | Average number of GDS read/write instructions executed per work item (affected by flow control). |
| `GPUBusy` | Percentage of time GPU is busy. |
| `L2CacheHit` | Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal). |
| `LDSBankConflict` | Percentage of GPU time LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). |
| `LDSInsts` | Average number of LDS read/write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS. |
| `MemUnitBusy` | Percentage of GPU time the memory unit is active. The result includes the stall time (`MemUnitStalled`). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). |
| `MemUnitStalled` | Percentage of GPU time the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). |
| `MemWrites32B` | Total number of effective 32B write transactions to the memory. |
| `SALUBusy` | Percentage of GPU time scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `SALUInsts` | Average number of scalar ALU instructions executed per work item (affected by flow control). |
| `SFetchInsts` | Average number of scalar fetch instructions from the video memory executed per work item (affected by flow control). |
| `TA_ADDR_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA address path is stalled by TC, over all TA instances. |
| `TA_ADDR_STALLED_BY_TD_CYCLES_sum` | Total number of cycles TA address path is stalled by TD, over all TA instances. |
| `TA_BUFFER_WAVEFRONTS_sum` | Total number of buffer wavefronts processed by all TA instances. |
| `TA_BUFFER_READ_WAVEFRONTS_sum` | Total number of buffer read wavefronts processed by all TA instances. |
| `TA_BUFFER_WRITE_WAVEFRONTS_sum` | Total number of buffer write wavefronts processed by all TA instances. |
| `TA_BUFFER_ATOMIC_WAVEFRONTS_sum` | Total number of buffer atomic wavefronts processed by all TA instances. |
| `TA_BUFFER_TOTAL_CYCLES_sum` | Total number of buffer cycles (including read and write) issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_READ_CYCLES_sum` | Total number of coalesced buffer read cycles issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_WRITE_CYCLES_sum` | Total number of coalesced buffer write cycles issued to TC by all TA instances. |
| `TA_BUSY_avr` | Average number of busy cycles over all TA instances. |
| `TA_BUSY_max` | Maximum number of TA busy cycles over all TA instances. |
| `TA_BUSY_min` | Minimum number of TA busy cycles over all TA instances. |
| `TA_DATA_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA data path is stalled by TC, over all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Sum of flat opcode reads processed by all TA instances. |
| `TA_FLAT_WRITE_WAVEFRONTS_sum` | Sum of flat opcode writes processed by all TA instances. |
| `TA_FLAT_WAVEFRONTS_sum` | Total number of flat opcode wavefronts processed by all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Total number of flat opcode read wavefronts processed by all TA instances. |
| `TA_FLAT_ATOMIC_WAVEFRONTS_sum` | Total number of flat opcode atomic wavefronts processed by all TA instances. |
| `TA_TA_BUSY_sum` | Total number of TA busy cycles over all TA instances. |
| `TA_TOTAL_WAVEFRONTS_sum` | Total number of wavefronts processed by all TA instances. |
| `TCA_BUSY_sum` | Total number of cycles TCA has a pending request, over all TCA instances. |
| `TCA_CYCLE_sum` | Total number of cycles over all TCA instances. |
| `TCC_ALL_TC_OP_WB_WRITEBACK_sum` | Total number of writebacks due to all TC_OP writeback requests, over all TCC instances. |
| `TCC_ALL_TC_OP_INV_EVICT_sum` | Total number of evictions due to all TC_OP invalidate requests, over all TCC instances. |
| `TCC_ATOMIC_sum` | Total number of L2 cache atomic requests of all types, over all TCC instances. |
| `TCC_BUSY_avr` | Average number of L2 cache busy cycles, over all TCC instances. |
| `TCC_BUSY_sum` | Total number of L2 cache busy cycles, over all TCC instances. |
| `TCC_CC_REQ_sum` | Total number of CC requests over all TCC instances. |
| `TCC_CYCLE_sum` | Total number of L2 cache free running clocks, over all TCC instances. |
| `TCC_EA_WRREQ_sum` | Total number of 32-byte and 64-byte transactions going over the TC_EA_wrreq interface, over all TCC instances. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. |
| `TCC_EA_WRREQ_64B_sum` | Total number of 64-byte transactions (write or `CMPSWAP`) going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_WR_UNCACHED_32B_sum` | Total Number of 32-byte write/atomic going over the TC_EA_wrreq interface due to uncached traffic, over all TCC instances. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. |
| `TCC_EA_WRREQ_STALL_sum` | Total Number of cycles a write request is stalled, over all instances. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of IO credits, over all instances. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of GMI credits, over all instances. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of DRAM credits, over all instances. |
| `TCC_EA_WRREQ_LEVEL_sum` | Total number of EA write requests in flight over all TCC instances. |
| `TCC_EA_RDREQ_LEVEL_sum` | Total number of EA read requests in flight over all TCC instances. |
| `TCC_EA_ATOMIC_sum` | Total Number of 32-byte or 64-byte atomic requests going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_ATOMIC_LEVEL_sum` | Total number of EA atomic requests in flight, over all TCC instances. |
| `TCC_EA_RDREQ_sum` | Total number of 32-byte or 64-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RDREQ_32B_sum` | Total number of 32-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RD_UNCACHED_32B_sum` | Total number of 32-byte EA reads due to uncached traffic, over all TCC instances. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of IO credits, over all TCC instances. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of GMI credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of DRAM credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA read requests to HBM, over all TCC instances. |
| `TCC_EA_WRREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA write requests to HBM, over all TCC instances. |
| `TCC_HIT_sum` | Total number of L2 cache hits over all TCC instances. |
| `TCC_MISS_sum` | Total number of L2 cache misses over all TCC instances. |
| `TCC_NC_REQ_sum` | Total number of NC requests over all TCC instances. |
| `TCC_NORMAL_WRITEBACK_sum` | Total number of writebacks due to requests that are not writeback requests, over all TCC instances. |
| `TCC_NORMAL_EVICT_sum` | Total number of evictions due to requests that are not invalidate or probe requests, over all TCC instances. |
| `TCC_PROBE_sum` | Total number of probe requests over all TCC instances. |
| `TCC_PROBE_ALL_sum` | Total number of external probe requests with EA_TCC_preq_all== 1, over all TCC instances. |
| `TCC_READ_sum` | Total number of L2 cache read requests (including compressed reads but not metadata reads) over all TCC instances. |
| `TCC_REQ_sum` | Total number of all types of L2 cache requests over all TCC instances. |
| `TCC_RW_REQ_sum` | Total number of RW requests over all TCC instances. |
| `TCC_STREAMING_REQ_sum` | Total number of L2 cache streaming requests over all TCC instances. |
| `TCC_TAG_STALL_sum` | Total number of cycles the normal request pipeline in the tag is stalled for any reason, over all TCC instances. |
| `TCC_TOO_MANY_EA_WRREQS_STALL_sum` | Total number of cycles L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests, over all TCC instances. |
| `TCC_UC_REQ_sum` | Total number of UC requests over all TCC instances. |
| `TCC_WRITE_sum` | Total number of L2 cache write requests over all TCC instances. |
| `TCC_WRITEBACK_sum` | Total number of lines written back to the main memory including writebacks of dirty lines and uncached write/atomic requests, over all TCC instances. |
| `TCC_WRREQ_STALL_max` | Maximum number of cycles a write request is stalled, over all TCC instances. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on an atomic, over all TCP instances. |
| `TCP_GATE_EN1_sum` | Total number of cycles vL1D interface clocks are turned on, over all TCP instances. |
| `TCP_GATE_EN2_sum` | Total number of cycles vL1D core clocks are turned on, over all TCP instances. |
| `TCP_PENDING_STALL_CYCLES_sum` | Total number of cycles vL1D cache is stalled due to data pending from L2 Cache, over all TCP instances. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a read, over all TCP instances. |
| `TCP_TA_TCP_STATE_READ_sum` | Total number of state reads by all TCP instances. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ_sum` | Total number of atomic requests to L2 cache with return, over all TCP instances. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum` | Total number of atomic requests to L2 cache without return, over all TCP instances. |
| `TCP_TCC_CC_READ_REQ_sum` | Total number of CC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_WRITE_REQ_sum` | Total number of CC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_ATOMIC_REQ_sum` | Total number of CC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_READ_REQ_sum` | Total number of NC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_WRITE_REQ_sum` | Total number of NC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_ATOMIC_REQ_sum` | Total number of NC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_READ_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return for all TCP instances. |
| `TCP_TCC_READ_REQ_sum` | Total number of read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_READ_REQ_sum` | Total number of RW read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_WRITE_REQ_sum` | Total number of RW write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_ATOMIC_REQ_sum` | Total number of RW atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_READ_REQ_sum` | Total number of UC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_WRITE_REQ_sum` | Total number of UC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_ATOMIC_REQ_sum` | Total number of UC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_WRITE_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return for all TCP instances. |
| `TCP_TCC_WRITE_REQ_sum` | Total number of write requests to L2 cache, over all TCP instances. |
| `TCP_TCP_LATENCY_sum` | Total wave access latency to vL1D over all wavefronts for all TCP instances. |
| `TCP_TCR_TCP_STALL_CYCLES_sum` | Total number of cycles TCR stalls vL1D, over all TCP instances. |
| `TCP_TD_TCP_STALL_CYCLES_sum` | Total number of cycles TD stalls vL1D, over all TCP instances. |
| `TCP_TOTAL_ACCESSES_sum` | Total number of vL1D accesses, over all TCP instances. |
| `TCP_TOTAL_READ_sum` | Total number of vL1D read accesses, over all TCP instances. |
| `TCP_TOTAL_WRITE_sum` | Total number of vL1D write accesses, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITH_RET_sum` | Total number of vL1D atomic requests with return, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET_sum` | Total number of vL1D atomic requests without return, over all TCP instances. |
| `TCP_TOTAL_CACHE_ACCESSES_sum` | Total number of vL1D cache accesses (including hits and misses) by all TCP instances. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES_sum` | Total number of vL1D writebacks and invalidates, over all TCP instances. |
| `TCP_UTCL1_PERMISSION_MISS_sum` | Total number of UTCL1 permission misses by all TCP instances. |
| `TCP_UTCL1_REQUEST_sum` | Total number of address translation requests to UTCL1 by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_MISS_sum` | Total number of UTCL1 translation misses by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_HIT_sum` | Total number of UTCL1 translation hits by all TCP instances. |
| `TCP_VOLATILE_sum` | Total number of L1 volatile pixels/buffers from TA, over all TCP instances. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a write, over all TCP instances. |
| `TD_ATOMIC_WAVEFRONT_sum` | Total number of atomic wavefront instructions, over all TD instances. |
| `TD_COALESCABLE_WAVEFRONT_sum` | Total number of coalescable wavefronts according to TA, over all TD instances. |
| `TD_LOAD_WAVEFRONT_sum` | Total number of wavefront instructions (read/write/atomic), over all TD instances. |
| `TD_SPI_STALL_sum` | Total number of cycles TD is stalled by SPI, over all TD instances. |
| `TD_STORE_WAVEFRONT_sum` | Total number of write wavefront instructions, over all TD instances. |
| `TD_TC_STALL_sum` | Total number of cycles TD is stalled waiting for TC data, over all TD instances. |
| `TD_TD_BUSY_sum` | Total number of TD busy cycles while it is processing or waiting for data, over all TD instances. |
| `VALUBusy` | Percentage of GPU time vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `VALUInsts` | Average number of vector ALU instructions executed per work item (affected by flow control). |
| `VALUUtilization` | Percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). |
| `VFetchInsts` | Average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory. |
| `VWriteInsts` | Average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory. |
| `Wavefronts` | Total wavefronts. |
| `WRITE_REQ_32B` | Total number of 32-byte effective memory writes. |
| `WriteSize` | Total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `WriteUnitStalled` | Percentage of GPU time the write unit is stalled. Value range: 0% to 100% (bad). |
## Abbreviations
| Abbreviation | Meaning |
|:------------|:--------------------------------------------------------------------------------|
| `ALU` | Arithmetic Logic Unit |
| `Arb` | Arbiter |
| `BF16` | Brain Floating Point - 16 bits |
| `CC` | Coherently Cached |
| `CP` | Command Processor |
| `CPC` | Command Processor - Compute |
| `CPF` | Command Processor - Fetcher |
| `CS` | Compute Shader |
| `CSC` | Compute Shader Controller |
| `CSn` | Compute Shader, the n-th pipe |
| `CU` | Compute Unit |
| `DW` | 32-bit Data Word, DWORD |
| `EA` | Efficiency Arbiter |
| `F16` | Half Precision Floating Point |
| `F32` | Full Precision Floating Point |
| `FLAT` | FLAT instructions allow read/write/atomic access to a generic memory address pointer, which can resolve to any of the following physical memories:<br>. Global Memory<br>. Scratch ("private")<br>. LDS ("shared")<br>. Invalid - MEM_VIOL TrapStatus |
| `FMA` | Fused Multiply Add |
| `GDS` | Global Data Share |
| `GRBM` | Graphics Register Bus Manager |
| `HBM` | High Bandwidth Memory |
| `Instr` | Instructions |
| `IOP` | Integer Operation |
| `L2` | Level-2 Cache |
| `LDS` | Local Data Share |
| `ME1` | Micro Engine, running packet processing firmware on CPC |
| `MFMA` | Matrix Fused Multiply Add |
| `NC` | Noncoherently Cached |
| `RW` | Coherently Cached with Write |
| `SALU` | Scalar ALU |
| `SGPR` | Scalar General Purpose Register |
| `SIMD` | Single Instruction Multiple Data |
| `sL1D` | Scalar Level-1 Data Cache |
| `SMEM` | Scalar Memory |
| `SPI` | Shader Processor Input |
| `SQ` | Sequencer |
| `TA` | Texture Addressing Unit |
| `TC` | Texture Cache |
| `TCA` | Texture Cache Arbiter |
| `TCC` | Texture Cache per Channel, known as L2 Cache |
| `TCIU` | Texture Cache Interface Unit (interface between CP and the memory system) |
| `TCP` | Texture Cache per Pipe, known as vector L1 Cache |
| `TCR` | Texture Cache Router |
| `TD` | Texture Data Unit |
| `UC` | Uncached |
| `UTCL1` | Unified Translation Cache - Level 1 |
| `UTCL2` | Unified Translation Cache - Level 2 |
| `VALU` | Vector ALU |
| `VGPR` | Vector General Purpose Register |
| `vL1D` | Vector Level -1 Data Cache |
| `VMEM` | Vector Memory |

View File

@@ -1,7 +1,7 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD Instinct MI250 microarchitecture">
<meta name="keywords" content="Instinct, MI250, microarchitecture">
<meta name="keywords" content="Instinct, MI250, microarchitecture, AMD, ROCm">
</head>
# AMD Instinct™ MI250 microarchitecture

View File

@@ -0,0 +1,758 @@
.. meta::
:description: MI300 and MI200 series performance counters and metrics
:keywords: MI300, MI200, performance counters, command processor counters
***************************************************************************************************
MI300 and MI200 series performance counters and metrics
***************************************************************************************************
This document lists and describes the hardware performance counters and derived metrics available
for the AMD Instinct™ MI300 and MI200 GPU. You can also access this information using the
:doc:`ROCProfiler tool <rocprofiler:rocprofv1>`.
MI300 and MI200 series performance counters
===============================================================
Series performance counters include the following categories:
* :ref:`command-processor-counters`
* :ref:`graphics-register-bus-manager-counters`
* :ref:`spi-counters`
* :ref:`compute-unit-counters`
* :ref:`l1i-and-sl1d-cache-counters`
* :ref:`vector-l1-cache-subsystem-counters`
* :ref:`l2-cache-access-counters`
The following sections provide additional details for each category.
.. note::
Preliminary validation of all MI300 and MI200 series performance counters is in progress. Those with
an asterisk (*) require further evaluation.
.. _command-processor-counters:
Command processor counters
---------------------------------------------------------------------------------------------------------------
Command processor counters are further classified into command processor-fetcher and command
processor-compute.
Command processor-fetcher counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``CPF_CMP_UTCL1_STALL_ON_TRANSLATION``", "Cycles", "Number of cycles one of the compute unified translation caches (L1) is stalled waiting on translation"
"``CPF_CPF_STAT_BUSY``", "Cycles", "Number of cycles command processor-fetcher is busy"
"``CPF_CPF_STAT_IDLE``", "Cycles", "Number of cycles command processor-fetcher is idle"
"``CPF_CPF_STAT_STALL``", "Cycles", "Number of cycles command processor-fetcher is stalled"
"``CPF_CPF_TCIU_BUSY``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is busy"
"``CPF_CPF_TCIU_IDLE``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is idle"
"``CPF_CPF_TCIU_STALL``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is stalled waiting on free tags"
The texture cache interface unit is the interface between the command processor and the memory
system.
Command processor-compute counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``CPC_ME1_BUSY_FOR_PACKET_DECODE``", "Cycles", "Number of cycles command processor-compute micro engine is busy decoding packets"
"``CPC_UTCL1_STALL_ON_TRANSLATION``", "Cycles", "Number of cycles one of the unified translation caches (L1) is stalled waiting on translation"
"``CPC_CPC_STAT_BUSY``", "Cycles", "Number of cycles command processor-compute is busy"
"``CPC_CPC_STAT_IDLE``", "Cycles", "Number of cycles command processor-compute is idle"
"``CPC_CPC_STAT_STALL``", "Cycles", "Number of cycles command processor-compute is stalled"
"``CPC_CPC_TCIU_BUSY``", "Cycles", "Number of cycles command processor-compute texture cache interface unit interface is busy"
"``CPC_CPC_TCIU_IDLE``", "Cycles", "Number of cycles command processor-compute texture cache interface unit interface is idle"
"``CPC_CPC_UTCL2IU_BUSY``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is busy"
"``CPC_CPC_UTCL2IU_IDLE``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is idle"
"``CPC_CPC_UTCL2IU_STALL``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is stalled"
"``CPC_ME1_DC0_SPI_BUSY``", "Cycles", "Number of cycles command processor-compute micro engine processor is busy"
The micro engine runs packet-processing firmware on the command processor-compute counter.
.. _graphics-register-bus-manager-counters:
Graphics register bus manager counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``GRBM_COUNT``", "Cycles","Number of free-running GPU cycles"
"``GRBM_GUI_ACTIVE``", "Cycles", "Number of GPU active cycles"
"``GRBM_CP_BUSY``", "Cycles", "Number of cycles any of the command processor blocks are busy"
"``GRBM_SPI_BUSY``", "Cycles", "Number of cycles any of the shader processor input is busy in the shader engines"
"``GRBM_TA_BUSY``", "Cycles", "Number of cycles any of the texture addressing unit is busy in the shader engines"
"``GRBM_TC_BUSY``", "Cycles", "Number of cycles any of the texture cache blocks are busy"
"``GRBM_CPC_BUSY``", "Cycles", "Number of cycles the command processor-compute is busy"
"``GRBM_CPF_BUSY``", "Cycles", "Number of cycles the command processor-fetcher is busy"
"``GRBM_UTCL2_BUSY``", "Cycles", "Number of cycles the unified translation cache (Level 2 [L2]) block is busy"
"``GRBM_EA_BUSY``", "Cycles", "Number of cycles the efficiency arbiter block is busy"
Texture cache blocks include:
* Texture cache arbiter
* Texture cache per pipe, also known as vector Level 1 (L1) cache
* Texture cache per channel, also known as known as L2 cache
* Texture cache interface
.. _spi-counters:
Shader processor input counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SPI_CSN_BUSY``", "Cycles", "Number of cycles with outstanding waves"
"``SPI_CSN_WINDOW_VALID``", "Cycles", "Number of cycles enabled by ``perfcounter_start`` event"
"``SPI_CSN_NUM_THREADGROUPS``", "Workgroups", "Number of dispatched workgroups"
"``SPI_CSN_WAVE``", "Wavefronts", "Number of dispatched wavefronts"
"``SPI_RA_REQ_NO_ALLOC``", "Cycles", "Number of arbiter cycles with requests but no allocation"
"``SPI_RA_REQ_NO_ALLOC_CSN``", "Cycles", "Number of arbiter cycles with compute shader (n\ :sup:`th` pipe) requests but no compute shader (n\ :sup:`th` pipe) allocation"
"``SPI_RA_RES_STALL_CSN``", "Cycles", "Number of arbiter stall cycles due to shortage of compute shader (n\ :sup:`th` pipe) pipeline slots"
"``SPI_RA_TMP_STALL_CSN``", "Cycles", "Number of stall cycles due to shortage of temp space"
"``SPI_RA_WAVE_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of single instruction, multiple data (SIMD) per cycle affected by shortage of wave slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_VGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of vector general-purpose register (VGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_SGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of scalar general-purpose register (SGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_LDS_CU_FULL_CSN``", "CU", "Number of compute units affected by shortage of local data share (LDS) space for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_BAR_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting at a BARRIER"
"``SPI_RA_BULKY_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting for BULKY resource"
"``SPI_RA_TGLIM_CU_FULL_CSN``", "Cycles", "Number of compute shader (n\ :sup:`th` pipe) wave stall cycles due to restriction of ``tg_limit`` for thread group size"
"``SPI_RA_WVLIM_STALL_CSN``", "Cycles", "Number of cycles compute shader (n\ :sup:`th` pipe) is stalled due to ``WAVE_LIMIT``"
"``SPI_VWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize VGPRs when launching waves"
"``SPI_SWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize SGPRs when launching waves"
.. _compute-unit-counters:
Compute unit counters
---------------------------------------------------------------------------------------------------------------
The compute unit counters are further classified into instruction mix, matrix fused multiply-add (FMA)
operation counters, level counters, wavefront counters, wavefront cycle counters, and LDS counters.
Instruction mix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_INSTS``", "Instr", "Number of instructions issued"
"``SQ_INSTS_VALU``", "Instr", "Number of vector arithmetic logic unit (VALU) instructions including matrix FMA issued"
"``SQ_INSTS_VALU_ADD_F16``", "Instr", "Number of VALU half-precision floating-point (F16) ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F16``", "Instr", "Number of VALU F16 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F16``", "Instr", "Number of VALU F16 FMA or multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F16``", "Instr", "Number of VALU F16 Transcendental instructions issued"
"``SQ_INSTS_VALU_ADD_F32``", "Instr", "Number of VALU full-precision floating-point (F32) ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F32``", "Instr", "Number of VALU F32 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F32``", "Instr", "Number of VALU F32 FMAor multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F32``", "Instr", "Number of VALU F32 Transcendental instructions issued"
"``SQ_INSTS_VALU_ADD_F64``", "Instr", "Number of VALU F64 ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F64``", "Instr", "Number of VALU F64 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F64``", "Instr", "Number of VALU F64 FMA or multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F64``", "Instr", "Number of VALU F64 Transcendental instructions issued"
"``SQ_INSTS_VALU_INT32``", "Instr", "Number of VALU 32-bit integer instructions (signed or unsigned) issued"
"``SQ_INSTS_VALU_INT64``", "Instr", "Number of VALU 64-bit integer instructions (signed or unsigned) issued"
"``SQ_INSTS_VALU_CVT``", "Instr", "Number of VALU Conversion instructions issued"
"``SQ_INSTS_VALU_MFMA_I8``", "Instr", "Number of 8-bit Integer matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F16``", "Instr", "Number of F16 matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F32``", "Instr", "Number of F32 matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F64``", "Instr", "Number of F64 matrix FMA instructions issued"
"``SQ_INSTS_MFMA``", "Instr", "Number of matrix FMA instructions issued"
"``SQ_INSTS_VMEM_WR``", "Instr", "Number of vector memory write instructions (including flat) issued"
"``SQ_INSTS_VMEM_RD``", "Instr", "Number of vector memory read instructions (including flat) issued"
"``SQ_INSTS_VMEM``", "Instr", "Number of vector memory instructions issued, including both flat and buffer instructions"
"``SQ_INSTS_SALU``", "Instr", "Number of scalar arithmetic logic unit (SALU) instructions issued"
"``SQ_INSTS_SMEM``", "Instr", "Number of scalar memory instructions issued"
"``SQ_INSTS_SMEM_NORM``", "Instr", "Number of scalar memory instructions normalized to match ``smem_level`` issued"
"``SQ_INSTS_FLAT``", "Instr", "Number of flat instructions issued"
"``SQ_INSTS_FLAT_LDS_ONLY``", "Instr", "**MI200 series only** Number of FLAT instructions that read/write only from/to LDS issued. Works only if ``EARLY_TA_DONE`` is enabled."
"``SQ_INSTS_LDS``", "Instr", "Number of LDS instructions issued **(MI200: includes flat; MI300: does not include flat)**"
"``SQ_INSTS_GDS``", "Instr", "Number of global data share instructions issued"
"``SQ_INSTS_EXP_GDS``", "Instr", "Number of EXP and global data share instructions excluding skipped export instructions issued"
"``SQ_INSTS_BRANCH``", "Instr", "Number of branch instructions issued"
"``SQ_INSTS_SENDMSG``", "Instr", "Number of ``SENDMSG`` instructions including ``s_endpgm`` issued"
"``SQ_INSTS_VSKIPPED``", "Instr", "Number of vector instructions skipped"
Flat instructions allow read, write, and atomic access to a generic memory address pointer that can
resolve to any of the following physical memories:
* Global Memory
* Scratch ("private")
* LDS ("shared")
* Invalid - ``MEM_VIOL`` TrapStatus
Matrix fused multiply-add operation counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_INSTS_VALU_MFMA_MOPS_I8``", "IOP", "Number of 8-bit integer matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F16``", "FLOP", "Number of F16 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_BF16``", "FLOP", "Number of BF16 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F32``", "FLOP", "Number of F32 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F64``", "FLOP", "Number of F64 floating matrix FMA ops in the unit of 512"
Level counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
All level counters must be followed by ``SQ_ACCUM_PREV_HIRES`` counter to measure average latency.
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_ACCUM_PREV``", "Count", "Accumulated counter sample value where accumulation takes place once every four cycles"
"``SQ_ACCUM_PREV_HIRES``", "Count", "Accumulated counter sample value where accumulation takes place once every cycle"
"``SQ_LEVEL_WAVES``", "Waves", "Number of inflight waves"
"``SQ_INST_LEVEL_VMEM``", "Instr", "Number of inflight vector memory (including flat) instructions"
"``SQ_INST_LEVEL_SMEM``", "Instr", "Number of inflight scalar memory instructions"
"``SQ_INST_LEVEL_LDS``", "Instr", "Number of inflight LDS (including flat) instructions"
"``SQ_IFETCH_LEVEL``", "Instr", "Number of inflight instruction fetch requests from the cache"
Use the following formulae to calculate latencies:
* Vector memory latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_VMEM``
* Wave latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_WAVE``
* LDS latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_LDS``
* Scalar memory latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_SMEM_NORM``
* Instruction fetch latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_IFETCH``
Wavefront counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_WAVES``", "Waves", "Number of wavefronts dispatched to sequencers, including both new and restored wavefronts"
"``SQ_WAVES_SAVED``", "Waves", "Number of context-saved waves"
"``SQ_WAVES_RESTORED``", "Waves", "Number of context-restored waves sent to sequencers"
"``SQ_WAVES_EQ_64``", "Waves", "Number of wavefronts with exactly 64 active threads sent to sequencers"
"``SQ_WAVES_LT_64``", "Waves", "Number of wavefronts with less than 64 active threads sent to sequencers"
"``SQ_WAVES_LT_48``", "Waves", "Number of wavefronts with less than 48 active threads sent to sequencers"
"``SQ_WAVES_LT_32``", "Waves", "Number of wavefronts with less than 32 active threads sent to sequencers"
"``SQ_WAVES_LT_16``", "Waves", "Number of wavefronts with less than 16 active threads sent to sequencers"
Wavefront cycle counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_CYCLES``", "Cycles", "Clock cycles"
"``SQ_BUSY_CYCLES``", "Cycles", "Number of cycles while sequencers reports it to be busy"
"``SQ_BUSY_CU_CYCLES``", "Qcycles", "Number of quad-cycles each compute unit is busy"
"``SQ_VALU_MFMA_BUSY_CYCLES``", "Cycles", "Number of cycles the matrix FMA arithmetic logic unit (ALU) is busy"
"``SQ_WAVE_CYCLES``", "Qcycles", "Number of quad-cycles spent by waves in the compute units"
"``SQ_WAIT_ANY``", "Qcycles", "Number of quad-cycles spent waiting for anything"
"``SQ_WAIT_INST_ANY``", "Qcycles", "Number of quad-cycles spent waiting for any instruction to be issued"
"``SQ_ACTIVE_INST_ANY``", "Qcycles", "Number of quad-cycles spent by each wave to work on an instruction"
"``SQ_ACTIVE_INST_VMEM``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a vector memory instruction"
"``SQ_ACTIVE_INST_LDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an LDS instruction"
"``SQ_ACTIVE_INST_VALU``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a VALU instruction"
"``SQ_ACTIVE_INST_SCA``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a SALU or scalar memory instruction"
"``SQ_ACTIVE_INST_EXP_GDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an ``EXPORT`` or ``GDS`` instruction"
"``SQ_ACTIVE_INST_MISC``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a ``BRANCH`` or ``SENDMSG`` instruction"
"``SQ_ACTIVE_INST_FLAT``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a flat instruction"
"``SQ_INST_CYCLES_VMEM_WR``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory write instructions"
"``SQ_INST_CYCLES_VMEM_RD``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory read instructions"
"``SQ_INST_CYCLES_SMEM``", "Qcycles", "Number of quad-cycles spent to execute scalar memory reads"
"``SQ_INST_CYCLES_SALU``", "Qcycles", "Number of quad-cycles spent to execute non-memory read scalar operations"
"``SQ_THREAD_CYCLES_VALU``", "Qcycles", "Number of quad-cycles spent to execute VALU operations on active threads"
"``SQ_WAIT_INST_LDS``", "Qcycles", "Number of quad-cycles spent waiting for LDS instruction to be issued"
``SQ_THREAD_CYCLES_VALU`` is similar to ``INST_CYCLES_VALU``, but it's multiplied by the number of
active threads.
LDS counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_LDS_ATOMIC_RETURN``", "Cycles", "Number of atomic return cycles in LDS"
"``SQ_LDS_BANK_CONFLICT``", "Cycles", "Number of cycles LDS is stalled by bank conflicts"
"``SQ_LDS_ADDR_CONFLICT``", "Cycles", "Number of cycles LDS is stalled by address conflicts"
"``SQ_LDS_UNALIGNED_STALL``", "Cycles", "Number of cycles LDS is stalled processing flat unaligned load or store operations"
"``SQ_LDS_MEM_VIOLATIONS``", "Count", "Number of threads that have a memory violation in the LDS"
"``SQ_LDS_IDX_ACTIVE``", "Cycles", "Number of cycles LDS is used for indexed operations"
Miscellaneous counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_IFETCH``", "Count", "Number of instruction fetch requests from L1i, in 32-byte width"
"``SQ_ITEMS``", "Threads", "Number of valid items per wave"
.. _l1i-and-sl1d-cache-counters:
L1 instruction cache (L1i) and scalar L1 data cache (L1d) counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQC_ICACHE_REQ``", "Req", "Number of L1 instruction (L1i) cache requests"
"``SQC_ICACHE_HITS``", "Count", "Number of L1i cache hits"
"``SQC_ICACHE_MISSES``", "Count", "Number of non-duplicate L1i cache misses including uncached requests"
"``SQC_ICACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate L1i cache misses whose previous lookup miss on the same cache line is not fulfilled yet"
"``SQC_DCACHE_REQ``", "Req", "Number of scalar L1d requests"
"``SQC_DCACHE_INPUT_VALID_READYB``", "Cycles", "Number of cycles while sequencer input is valid but scalar L1d is not ready"
"``SQC_DCACHE_HITS``", "Count", "Number of scalar L1d hits"
"``SQC_DCACHE_MISSES``", "Count", "Number of non-duplicate scalar L1d misses including uncached requests"
"``SQC_DCACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate scalar L1d misses"
"``SQC_DCACHE_REQ_READ_1``", "Req", "Number of constant cache read requests in a single 32-bit data word"
"``SQC_DCACHE_REQ_READ_2``", "Req", "Number of constant cache read requests in two 32-bit data words"
"``SQC_DCACHE_REQ_READ_4``", "Req", "Number of constant cache read requests in four 32-bit data words"
"``SQC_DCACHE_REQ_READ_8``", "Req", "Number of constant cache read requests in eight 32-bit data words"
"``SQC_DCACHE_REQ_READ_16``", "Req", "Number of constant cache read requests in 16 32-bit data words"
"``SQC_DCACHE_ATOMIC``", "Req", "Number of atomic requests"
"``SQC_TC_REQ``", "Req", "Number of texture cache requests that were issued by instruction and constant caches"
"``SQC_TC_INST_REQ``", "Req", "Number of instruction requests to the L2 cache"
"``SQC_TC_DATA_READ_REQ``", "Req", "Number of data Read requests to the L2 cache"
"``SQC_TC_DATA_WRITE_REQ``", "Req", "Number of data write requests to the L2 cache"
"``SQC_TC_DATA_ATOMIC_REQ``", "Req", "Number of data atomic requests to the L2 cache"
"``SQC_TC_STALL``", "Cycles", "Number of cycles while the valid requests to the L2 cache are stalled"
.. _vector-l1-cache-subsystem-counters:
Vector L1 cache subsystem counters
---------------------------------------------------------------------------------------------------------------
The vector L1 cache subsystem counters are further classified into texture addressing unit, texture data
unit, vector L1d or texture cache per pipe, and texture cache arbiter counters.
Texture addressing unit counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TA_TA_BUSY[n]``", "Cycles", "Texture addressing unit busy cycles", "0-15"
"``TA_TOTAL_WAVEFRONTS[n]``", "Instr", "Number of wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_WAVEFRONTS[n]``", "Instr", "Number of buffer wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_READ_WAVEFRONTS[n]``", "Instr", "Number of buffer read wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_WRITE_WAVEFRONTS[n]``", "Instr", "Number of buffer write wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_ATOMIC_WAVEFRONTS[n]``", "Instr", "Number of buffer atomic wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_TOTAL_CYCLES[n]``", "Cycles", "Number of buffer cycles (including read and write) issued to texture cache", "0-15"
"``TA_BUFFER_COALESCED_READ_CYCLES[n]``", "Cycles", "Number of coalesced buffer read cycles issued to texture cache", "0-15"
"``TA_BUFFER_COALESCED_WRITE_CYCLES[n]``", "Cycles", "Number of coalesced buffer write cycles issued to texture cache", "0-15"
"``TA_ADDR_STALLED_BY_TC_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit address path is stalled by texture cache", "0-15"
"``TA_DATA_STALLED_BY_TC_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit data path is stalled by texture cache", "0-15"
"``TA_ADDR_STALLED_BY_TD_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit address path is stalled by texture data unit", "0-15"
"``TA_FLAT_WAVEFRONTS[n]``", "Instr", "Number of flat opcode wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_READ_WAVEFRONTS[n]``", "Instr", "Number of flat opcode read wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_WRITE_WAVEFRONTS[n]``", "Instr", "Number of flat opcode write wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_ATOMIC_WAVEFRONTS[n]``", "Instr", "Number of flat opcode atomic wavefronts processed by texture addressing unit", "0-15"
Texture data unit counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TD_TD_BUSY[n]``", "Cycle", "Texture data unit busy cycles while it is processing or waiting for data", "0-15"
"``TD_TC_STALL[n]``", "Cycle", "Number of cycles texture data unit is stalled waiting for texture cache data", "0-15"
"``TD_SPI_STALL[n]``", "Cycle", "Number of cycles texture data unit is stalled by shader processor input", "0-15"
"``TD_LOAD_WAVEFRONT[n]``", "Instr", "Number of wavefront instructions (read, write, atomic)", "0-15"
"``TD_STORE_WAVEFRONT[n]``", "Instr", "Number of write wavefront instructions", "0-15"
"``TD_ATOMIC_WAVEFRONT[n]``", "Instr", "Number of atomic wavefront instructions", "0-15"
"``TD_COALESCABLE_WAVEFRONT[n]``", "Instr", "Number of coalescable wavefronts according to texture addressing unit", "0-15"
Texture cache per pipe counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCP_GATE_EN1[n]``", "Cycles", "Number of cycles vector L1d interface clocks are turned on", "0-15"
"``TCP_GATE_EN2[n]``", "Cycles", "Number of cycles vector L1d core clocks are turned on", "0-15"
"``TCP_TD_TCP_STALL_CYCLES[n]``", "Cycles", "Number of cycles texture data unit stalls vector L1d", "0-15"
"``TCP_TCR_TCP_STALL_CYCLES[n]``", "Cycles", "Number of cycles texture cache router stalls vector L1d", "0-15"
"``TCP_READ_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on a read", "0-15"
"``TCP_WRITE_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on a write", "0-15"
"``TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on an atomic", "0-15"
"``TCP_PENDING_STALL_CYCLES[n]``", "Cycles", "Number of cycles vector L1d is stalled due to data pending from L2 Cache", "0-15"
"``TCP_TCP_TA_DATA_STALL_CYCLES``", "Cycles", "Number of cycles texture cache per pipe stalls texture addressing unit data interface", "NA"
"``TCP_TA_TCP_STATE_READ[n]``", "Req", "Number of state reads", "0-15"
"``TCP_VOLATILE[n]``", "Req", "Number of L1 volatile pixels or buffers from texture addressing unit", "0-15"
"``TCP_TOTAL_ACCESSES[n]``", "Req", "Number of vector L1d accesses. Equals ``TCP_PERF_SEL_TOTAL_READ`+`TCP_PERF_SEL_TOTAL_NONREAD``", "0-15"
"``TCP_TOTAL_READ[n]``", "Req", "Number of vector L1d read accesses", "0-15"
"``TCP_TOTAL_WRITE[n]``", "Req", "Number of vector L1d write accesses", "0-15"
"``TCP_TOTAL_ATOMIC_WITH_RET[n]``", "Req", "Number of vector L1d atomic requests with return", "0-15"
"``TCP_TOTAL_ATOMIC_WITHOUT_RET[n]``", "Req", "Number of vector L1d atomic without return", "0-15"
"``TCP_TOTAL_WRITEBACK_INVALIDATES[n]``", "Count", "Total number of vector L1d writebacks and invalidates", "0-15"
"``TCP_UTCL1_REQUEST[n]``", "Req", "Number of address translation requests to unified translation cache (L1)", "0-15"
"``TCP_UTCL1_TRANSLATION_HIT[n]``", "Req", "Number of unified translation cache (L1) translation hits", "0-15"
"``TCP_UTCL1_TRANSLATION_MISS[n]``", "Req", "Number of unified translation cache (L1) translation misses", "0-15"
"``TCP_UTCL1_PERMISSION_MISS[n]``", "Req", "Number of unified translation cache (L1) permission misses", "0-15"
"``TCP_TOTAL_CACHE_ACCESSES[n]``", "Req", "Number of vector L1d cache accesses including hits and misses", "0-15"
"``TCP_TCP_LATENCY[n]``", "Cycles", "**MI200 series only** Accumulated wave access latency to vL1D over all wavefronts", "0-15"
"``TCP_TCC_READ_REQ_LATENCY[n]``", "Cycles", "**MI200 series only** Total vL1D to L2 request latency over all wavefronts for reads and atomics with return", "0-15"
"``TCP_TCC_WRITE_REQ_LATENCY[n]``", "Cycles", "**MI200 series only** Total vL1D to L2 request latency over all wavefronts for writes and atomics without return", "0-15"
"``TCP_TCC_READ_REQ[n]``", "Req", "Number of read requests to L2 cache", "0-15"
"``TCP_TCC_WRITE_REQ[n]``", "Req", "Number of write requests to L2 cache", "0-15"
"``TCP_TCC_ATOMIC_WITH_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache with return", "0-15"
"``TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache without return", "0-15"
"``TCP_TCC_NC_READ_REQ[n]``", "Req", "Number of non-coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_UC_READ_REQ[n]``", "Req", "Number of uncached read requests to L2 cache", "0-15"
"``TCP_TCC_CC_READ_REQ[n]``", "Req", "Number of coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_RW_READ_REQ[n]``", "Req", "Number of coherently cached with write read requests to L2 cache", "0-15"
"``TCP_TCC_NC_WRITE_REQ[n]``", "Req", "Number of non-coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_UC_WRITE_REQ[n]``", "Req", "Number of uncached write requests to L2 cache", "0-15"
"``TCP_TCC_CC_WRITE_REQ[n]``", "Req", "Number of coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_RW_WRITE_REQ[n]``", "Req", "Number of coherently cached with write write requests to L2 cache", "0-15"
"``TCP_TCC_NC_ATOMIC_REQ[n]``", "Req", "Number of non-coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_UC_ATOMIC_REQ[n]``", "Req", "Number of uncached atomic requests to L2 cache", "0-15"
"``TCP_TCC_CC_ATOMIC_REQ[n]``", "Req", "Number of coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_RW_ATOMIC_REQ[n]``", "Req", "Number of coherently cached with write atomic requests to L2 cache", "0-15"
Note that:
* ``TCP_TOTAL_READ[n]`` = ``TCP_PERF_SEL_TOTAL_HIT_LRU_READ`` + ``TCP_PERF_SEL_TOTAL_MISS_LRU_READ`` + ``TCP_PERF_SEL_TOTAL_MISS_EVICT_READ``
* ``TCP_TOTAL_WRITE[n]`` = ``TCP_PERF_SEL_TOTAL_MISS_LRU_WRITE``+ ``TCP_PERF_SEL_TOTAL_MISS_EVICT_WRITE``
* ``TCP_TOTAL_WRITEBACK_INVALIDATES[n]`` = ``TCP_PERF_SEL_TOTAL_WBINVL1``+ ``TCP_PERF_SEL_TOTAL_WBINVL1_VOL``+ ``TCP_PERF_SEL_CP_TCP_INVALIDATE``+ ``TCP_PERF_SEL_SQ_TCP_INVALIDATE_VOL``
Texture cache arbiter counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCA_CYCLE[n]``", "Cycles", "Number of texture cache arbiter cycles", "0-31"
"``TCA_BUSY[n]``", "Cycles", "Number of cycles texture cache arbiter has a pending request", "0-31"
.. _l2-cache-access-counters:
L2 cache access counters
---------------------------------------------------------------------------------------------------------------
L2 cache is also known as texture cache per channel.
.. tab-set::
.. tab-item:: MI300 hardware counter
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA0_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA0_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_WR_UNCACHED_32B[n]``", "Req", "Number of 32 or 64-byte write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic", "0-31"
"``TCC_EA0_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA0_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA0_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA0_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA0_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA0_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA0_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA0_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA0_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
.. tab-item:: MI200 hardware counter
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_WR_UNCACHED_32B[n]``", "Req", "Number of 32 write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic. A 64-byte request will be counted as 2", "0-31"
"``TCC_EA_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
Note the following:
* ``TCC_REQ[n]`` may be more than the number of requests arriving at the texture cache per channel,
but it's a good indication of the total amount of work that needs to be performed.
* For ``TCC_EA0_WRREQ[n]``, atomics may travel over the same interface and are generally classified as
write requests.
* CC mtypes can produce uncached requests, and those are included in
``TCC_EA0_WR_UNCACHED_32B[n]``
* ``TCC_EA0_WRREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter write latency.
* Average write latency = ``TCC_PERF_SEL_EA0_WRREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ``
* ``TCC_EA0_ATOMIC_LEVEL[n]`` is primarily intended to measure average efficiency arbiter atomic
latency
* Average atomic latency = ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC``
* ``TCC_EA0_RDREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter read latency.
* Average read latency = ``TCC_PERF_SEL_EA0_RDREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_RDREQ``
* Stalls can occur regardless of the need for a read to be performed
* Normally, stalls are measured exactly at one point in the pipeline however in the case of
``TCC_TAG_STALL[n]``, probes can stall the pipeline at a variety of places. There is no single point that
can accurately measure the total stalls
MI300 and MI200 series derived metrics list
==============================================================
.. csv-table::
:header: "Hardware counter", "Definition"
"``ALUStalledByLDS``", "Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready (value range: 0% (optimal) to 100%)"
"``FetchSize``", "Total kilobytes fetched from the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``FlatLDSInsts``", "Average number of flat instructions that read from or write to LDS, run per work item (affected by flow control)"
"``FlatVMemInsts``", "Average number of flat instructions that read from or write to the video memory, run per work item (affected by flow control). Includes flat instructions that read from or write to scratch"
"``GDSInsts``", "Average number of global data share read or write instructions run per work item (affected by flow control)"
"``GPUBusy``", "Percentage of time GPU is busy"
"``L2CacheHit``", "Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache (value range: 0% (no hit) to 100% (optimal))"
"``LDSBankConflict``", "Percentage of GPU time LDS is stalled by bank conflicts (value range: 0% (optimal) to 100%)"
"``LDSInsts``", "Average number of LDS read or write instructions run per work item (affected by flow control). Excludes flat instructions that read from or write to LDS."
"``MemUnitBusy``", "Percentage of GPU time the memory unit is active, which is measured with all extra fetches and writes and any cache or memory effects taken into account (value range: 0% to 100% (fetch-bound))"
"``MemUnitStalled``", "Percentage of GPU time the memory unit is stalled (value range: 0% (optimal) to 100%)"
"``MemWrites32B``", "Total number of effective 32B write transactions to the memory"
"``TCA_BUSY_sum``", "Total number of cycles texture cache arbiter has a pending request, over all texture cache arbiter instances"
"``TCA_CYCLE_sum``", "Total number of cycles over all texture cache arbiter instances"
"``SALUBusy``", "Percentage of GPU time scalar ALU instructions are processed (value range: 0% to 100% (optimal))"
"``SALUInsts``", "Average number of scalar ALU instructions run per work item (affected by flow control)"
"``SFetchInsts``", "Average number of scalar fetch instructions from the video memory run per work item (affected by flow control)"
"``VALUBusy``", "Percentage of GPU time vector ALU instructions are processed (value range: 0% to 100% (optimal))"
"``VALUInsts``", "Average number of vector ALU instructions run per work item (affected by flow control)"
"``VALUUtilization``", "Percentage of active vector ALU threads in a wave, where a lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64 (value range: 0%, 100% (optimal - no thread divergence))"
"``VFetchInsts``", "Average number of vector fetch instructions from the video memory run per work-item (affected by flow control); excludes flat instructions that fetch from video memory"
"``VWriteInsts``", "Average number of vector write instructions to the video memory run per work-item (affected by flow control); excludes flat instructions that write to video memory"
"``Wavefronts``", "Total wavefronts"
"``WRITE_REQ_32B``", "Total number of 32-byte effective memory writes"
"``WriteSize``", "Total kilobytes written to the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``WriteUnitStalled``", "Percentage of GPU time the write unit is stalled (value range: 0% (optimal) to 100%)"
You can lower ``ALUStalledByLDS`` by reducing LDS bank conflicts or number of LDS accesses.
You can lower ``MemUnitStalled`` by reducing the number or size of fetches and writes.
``MemUnitBusy`` includes the stall time (``MemUnitStalled``).
Hardware counters by and over all texture addressing unit instances
---------------------------------------------------------------------------------------------------------------
The following table shows the hardware counters *by* all texture addressing unit instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TA_BUFFER_WAVEFRONTS_sum``", "Total number of buffer wavefronts processed"
"``TA_BUFFER_READ_WAVEFRONTS_sum``", "Total number of buffer read wavefronts processed"
"``TA_BUFFER_WRITE_WAVEFRONTS_sum``", "Total number of buffer write wavefronts processed"
"``TA_BUFFER_ATOMIC_WAVEFRONTS_sum``", "Total number of buffer atomic wavefronts processed"
"``TA_BUFFER_TOTAL_CYCLES_sum``", "Total number of buffer cycles (including read and write) issued to texture cache"
"``TA_BUFFER_COALESCED_READ_CYCLES_sum``", "Total number of coalesced buffer read cycles issued to texture cache"
"``TA_BUFFER_COALESCED_WRITE_CYCLES_sum``", "Total number of coalesced buffer write cycles issued to texture cache"
"``TA_FLAT_READ_WAVEFRONTS_sum``", "Sum of flat opcode reads processed"
"``TA_FLAT_WRITE_WAVEFRONTS_sum``", "Sum of flat opcode writes processed"
"``TA_FLAT_WAVEFRONTS_sum``", "Total number of flat opcode wavefronts processed"
"``TA_FLAT_READ_WAVEFRONTS_sum``", "Total number of flat opcode read wavefronts processed"
"``TA_FLAT_ATOMIC_WAVEFRONTS_sum``", "Total number of flat opcode atomic wavefronts processed"
"``TA_TOTAL_WAVEFRONTS_sum``", "Total number of wavefronts processed"
The following table shows the hardware counters *over* all texture addressing unit instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TA_ADDR_STALLED_BY_TC_CYCLES_sum``", "Total number of cycles texture addressing unit address path is stalled by texture cache"
"``TA_ADDR_STALLED_BY_TD_CYCLES_sum``", "Total number of cycles texture addressing unit address path is stalled by texture data unit"
"``TA_BUSY_avr``", "Average number of busy cycles"
"``TA_BUSY_max``", "Maximum number of texture addressing unit busy cycles"
"``TA_BUSY_min``", "Minimum number of texture addressing unit busy cycles"
"``TA_DATA_STALLED_BY_TC_CYCLES_sum``", "Total number of cycles texture addressing unit data path is stalled by texture cache"
"``TA_TA_BUSY_sum``", "Total number of texture addressing unit busy cycles"
Hardware counters over all texture cache per channel instances
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCC_ALL_TC_OP_WB_WRITEBACK_sum``", "Total number of writebacks due to all ``TC_OP`` writeback requests."
"``TCC_ALL_TC_OP_INV_EVICT_sum``", "Total number of evictions due to all ``TC_OP`` invalidate requests."
"``TCC_ATOMIC_sum``", "Total number of L2 cache atomic requests of all types."
"``TCC_BUSY_avr``", "Average number of L2 cache busy cycles."
"``TCC_BUSY_sum``", "Total number of L2 cache busy cycles."
"``TCC_CC_REQ_sum``", "Total number of coherently cached requests."
"``TCC_CYCLE_sum``", "Total number of L2 cache free running clocks."
"``TCC_EA0_WRREQ_sum``", "Total number of 32-byte and 64-byte transactions going over the ``TC_EA0_wrreq`` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands."
"``TCC_EA0_WRREQ_64B_sum``", "Total number of 64-byte transactions (write or `CMPSWAP`) going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_WR_UNCACHED_32B_sum``", "Total Number of 32-byte write or atomic going over the ``TC_EA0_wrreq`` interface due to uncached traffic. Note that coherently cached mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2."
"``TCC_EA0_WRREQ_STALL_sum``", "Total Number of cycles a write request is stalled, over all instances."
"``TCC_EA0_WRREQ_IO_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of IO credits, over all instances."
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits, over all instances."
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits, over all instances."
"``TCC_EA0_WRREQ_LEVEL_sum``", "Total number of efficiency arbiter write requests in flight."
"``TCC_EA0_RDREQ_LEVEL_sum``", "Total number of efficiency arbiter read requests in flight."
"``TCC_EA0_ATOMIC_sum``", "Total Number of 32-byte or 64-byte atomic requests going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_ATOMIC_LEVEL_sum``", "Total number of efficiency arbiter atomic requests in flight."
"``TCC_EA0_RDREQ_sum``", "Total number of 32-byte or 64-byte read requests to efficiency arbiter."
"``TCC_EA0_RDREQ_32B_sum``", "Total number of 32-byte read requests to efficiency arbiter."
"``TCC_EA0_RD_UNCACHED_32B_sum``", "Total number of 32-byte efficiency arbiter reads due to uncached traffic."
"``TCC_EA0_RDREQ_IO_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of IO credits."
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of GMI credits."
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of DRAM credits."
"``TCC_EA0_RDREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter read requests to HBM."
"``TCC_EA0_WRREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter write requests to HBM."
"``TCC_HIT_sum``", "Total number of L2 cache hits."
"``TCC_MISS_sum``", "Total number of L2 cache misses."
"``TCC_NC_REQ_sum``", "Total number of non-coherently cached requests."
"``TCC_NORMAL_WRITEBACK_sum``", "Total number of writebacks due to requests that are not writeback requests."
"``TCC_NORMAL_EVICT_sum``", "Total number of evictions due to requests that are not invalidate or probe requests."
"``TCC_PROBE_sum``", "Total number of probe requests."
"``TCC_PROBE_ALL_sum``", "Total number of external probe requests with ``EA0_TCC_preq_all == 1``."
"``TCC_READ_sum``", "Total number of L2 cache read requests (including compressed reads but not metadata reads)."
"``TCC_REQ_sum``", "Total number of all types of L2 cache requests."
"``TCC_RW_REQ_sum``", "Total number of coherently cached with write requests."
"``TCC_STREAMING_REQ_sum``", "Total number of L2 cache streaming requests."
"``TCC_TAG_STALL_sum``", "Total number of cycles the normal request pipeline in the tag is stalled for any reason."
"``TCC_TOO_MANY_EA0_WRREQS_STALL_sum``", "Total number of cycles L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests."
"``TCC_UC_REQ_sum``", "Total number of uncached requests."
"``TCC_WRITE_sum``", "Total number of L2 cache write requests."
"``TCC_WRITEBACK_sum``", "Total number of lines written back to the main memory including writebacks of dirty lines and uncached write or atomic requests."
"``TCC_WRREQ_STALL_max``", "Maximum number of cycles a write request is stalled."
Hardware counters by, for, or over all texture cache per pipe instances
----------------------------------------------------------------------------------------------------------------
The following table shows the hardware counters *by* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_TA_TCP_STATE_READ_sum``", "Total number of state reads by ATCPPI"
"``TCP_TOTAL_CACHE_ACCESSES_sum``", "Total number of vector L1d accesses (including hits and misses)"
"``TCP_UTCL1_PERMISSION_MISS_sum``", "Total number of unified translation cache (L1) permission misses"
"``TCP_UTCL1_REQUEST_sum``", "Total number of address translation requests to unified translation cache (L1)"
"``TCP_UTCL1_TRANSLATION_MISS_sum``", "Total number of unified translation cache (L1) translation misses"
"``TCP_UTCL1_TRANSLATION_HIT_sum``", "Total number of unified translation cache (L1) translation hits"
The following table shows the hardware counters *for* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_TCC_READ_REQ_LATENCY_sum``", "Total vector L1d to L2 request latency over all wavefronts for reads and atomics with return"
"``TCP_TCC_WRITE_REQ_LATENCY_sum``", "Total vector L1d to L2 request latency over all wavefronts for writes and atomics without return"
"``TCP_TCP_LATENCY_sum``", "Total wave access latency to vector L1d over all wavefronts"
The following table shows the hardware counters *over* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on an atomic"
"``TCP_GATE_EN1_sum``", "Total number of cycles vector L1d interface clocks are turned on"
"``TCP_GATE_EN2_sum``", "Total number of cycles vector L1d core clocks are turned on"
"``TCP_PENDING_STALL_CYCLES_sum``", "Total number of cycles vector L1d cache is stalled due to data pending from L2 Cache"
"``TCP_READ_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on a read"
"``TCP_TCC_ATOMIC_WITH_RET_REQ_sum``", "Total number of atomic requests to L2 cache with return"
"``TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum``", "Total number of atomic requests to L2 cache without return"
"``TCP_TCC_CC_READ_REQ_sum``", "Total number of coherently cached read requests to L2 cache"
"``TCP_TCC_CC_WRITE_REQ_sum``", "Total number of coherently cached write requests to L2 cache"
"``TCP_TCC_CC_ATOMIC_REQ_sum``", "Total number of coherently cached atomic requests to L2 cache"
"``TCP_TCC_NC_READ_REQ_sum``", "Total number of non-coherently cached read requests to L2 cache"
"``TCP_TCC_NC_WRITE_REQ_sum``", "Total number of non-coherently cached write requests to L2 cache"
"``TCP_TCC_NC_ATOMIC_REQ_sum``", "Total number of non-coherently cached atomic requests to L2 cache"
"``TCP_TCC_READ_REQ_sum``", "Total number of read requests to L2 cache"
"``TCP_TCC_RW_READ_REQ_sum``", "Total number of coherently cached with write read requests to L2 cache"
"``TCP_TCC_RW_WRITE_REQ_sum``", "Total number of coherently cached with write write requests to L2 cache"
"``TCP_TCC_RW_ATOMIC_REQ_sum``", "Total number of coherently cached with write atomic requests to L2 cache"
"``TCP_TCC_UC_READ_REQ_sum``", "Total number of uncached read requests to L2 cache"
"``TCP_TCC_UC_WRITE_REQ_sum``", "Total number of uncached write requests to L2 cache"
"``TCP_TCC_UC_ATOMIC_REQ_sum``", "Total number of uncached atomic requests to L2 cache"
"``TCP_TCC_WRITE_REQ_sum``", "Total number of write requests to L2 cache"
"``TCP_TCR_TCP_STALL_CYCLES_sum``", "Total number of cycles texture cache router stalls vector L1d"
"``TCP_TD_TCP_STALL_CYCLES_sum``", "Total number of cycles texture data unit stalls vector L1d"
"``TCP_TOTAL_ACCESSES_sum``", "Total number of vector L1d accesses"
"``TCP_TOTAL_READ_sum``", "Total number of vector L1d read accesses"
"``TCP_TOTAL_WRITE_sum``", "Total number of vector L1d write accesses"
"``TCP_TOTAL_ATOMIC_WITH_RET_sum``", "Total number of vector L1d atomic requests with return"
"``TCP_TOTAL_ATOMIC_WITHOUT_RET_sum``", "Total number of vector L1d atomic requests without return"
"``TCP_TOTAL_WRITEBACK_INVALIDATES_sum``", "Total number of vector L1d writebacks and invalidates"
"``TCP_VOLATILE_sum``", "Total number of L1 volatile pixels or buffers from texture addressing unit"
"``TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on a write"
Hardware counter over all texture data unit instances
--------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Definition"
"``TD_ATOMIC_WAVEFRONT_sum``", "Total number of atomic wavefront instructions"
"``TD_COALESCABLE_WAVEFRONT_sum``", "Total number of coalescable wavefronts according to texture addressing unit"
"``TD_LOAD_WAVEFRONT_sum``", "Total number of wavefront instructions (read, write, atomic)"
"``TD_SPI_STALL_sum``", "Total number of cycles texture data unit is stalled by shader processor input"
"``TD_STORE_WAVEFRONT_sum``", "Total number of write wavefront instructions"
"``TD_TC_STALL_sum``", "Total number of cycles texture data unit is stalled waiting for texture cache data"
"``TD_TD_BUSY_sum``", "Total number of texture data unit busy cycles while it is processing or waiting for data"

View File

@@ -0,0 +1,122 @@
# AMD Instinct™ MI300 series microarchitecture
The AMD Instinct MI300 series accelerators are based on the AMD CDNA 3
architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine
learning (ML) workloads. The AMD Instinct MI300 series accelerators are well-suited for extreme scalability and compute performance, running
on everything from individual servers to the worlds largest exascale supercomputers.
With the MI300 series, AMD is introducing the Accelerator Complex Die (XCD), which contains the
GPU computational elements of the processor along with the lower levels of the cache hierarchy.
The following image depicts the structure of a single XCD in the AMD Instinct MI300 accelerator series.
```{figure} ../../data/conceptual/gpu-arch/image007.png
---
name: mi300-xcd
align: center
---
XCD-level system architecture showing 40 Compute Units, each with 32 KB L1 cache, a Unified Compute System with 4 ACE Compute Accelerators, shared 4MB of L2 cache and an HWS Hardware Scheduler.
```
On the XCD, four Asynchronous Compute Engines (ACEs) send compute shader workgroups to the
Compute Units (CUs). The XCD has 40 CUs: 38 active CUs at the aggregate level and 2 disabled CUs for
yield management. The CUs all share a 4 MB L2 cache that serves to coalesce all memory traffic for the
die. With less than half of the CUs of the AMD Instinct MI200 Series compute die, the AMD CDNA™ 3
XCD die is a smaller building block. However, it uses more advanced packaging and the processor
can include 6 or 8 XCDs for up to 304 CUs, roughly 40% more than MI250X.
The MI300 Series integrate up to 8 vertically stacked XCDs, 8 stacks of
High-Bandwidth Memory 3 (HBM3) and 4 I/O dies (containing system
infrastructure) using the AMD Infinity Fabric™ technology as interconnect.
The Matrix Cores inside the CDNA 3 CUs have significant improvements, emphasizing AI and machine
learning, enhancing throughput of existing data types while adding support for new data types.
CDNA 2 Matrix Cores support FP16 and BF16, while offering INT8 for inference. Compared to MI250X
accelerators, CDNA 3 Matrix Cores triple the performance for FP16 and BF16, while providing a
performance gain of 6.8 times for INT8. FP8 has a performance gain of 16 times compared to FP32,
while TF32 has a gain of 4 times compared to FP32.
```{list-table} Peak-performance capabilities of the MI300X for different data types.
:header-rows: 1
:name: mi300x-perf-table
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
*
- Matrix FP64
- 256
- 163.4
*
- Vector FP64
- 128
- 81.7
*
- Matrix FP32
- 256
- 163.4
*
- Vector FP32
- 256
- 163.4
*
- Vector TF32
- 1024
- 653.7
*
- Matrix FP16
- 2048
- 1307.4
*
- Matrix BF16
- 2048
- 1307.4
*
- Matrix FP8
- 4096
- 2614.9
*
- Matrix INT8
- 4096
- 2614.9
```
The above table summarizes the aggregated peak performance of the AMD Instinct MI300X Open
Compute Platform (OCP) Open Accelerator Modules (OAMs) for different data types and command
processors. The middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction is submitted in each clock
cycle. The third column lists the theoretical peak performance of the OAM. The theoretical aggregated
peak memory bandwidth of the GPU is 5.3 TB per second.
The following image shows the block diagram of the APU (left) and the OAM package (right) both
connected via AMD Infinity Fabric™ network on-chip.
```{figure} ../../data/conceptual/gpu-arch/image008.png
---
name: mi300-arch
alt:
align: center
---
MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, while the MI300X (right) has 8 XCDs.
```
## Node-level architecture
```{figure} ../../data/conceptual/gpu-arch/image009.png
---
name: mi300-node
align: center
---
MI300 series node-level architecture showing 8 fully interconnected MI300X OAM modules connected to (optional) PCIEe switches via retimers and HGX connectors.
```
The image above shows the node-level architecture of a system with AMD EPYC processors in a
dual-socket configuration and eight AMD Instinct MI300X accelerators. The MI300X OAMs attach to the
host system via PCIe Gen 5 x16 links (yellow lines). The GPUs are using seven high-bandwidth,
low-latency AMD Infinity Fabric™ links (red lines) to form a fully connected 8-GPU system.
<!---
We need performance data about the P2P communication here.
-->

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="GPU isolation techniques">
<meta name="keywords" content="GPU isolation techniques, UUID, universally unique identifier,
environment variables, virtual machines">
environment variables, virtual machines, AMD, ROCm">
</head>
# GPU isolation techniques

View File

@@ -2,15 +2,15 @@
<meta charset="UTF-8">
<meta name="description" content="GPU memory">
<meta name="keywords" content="GPU memory, VRAM, video random access memory, pageable
memory, pinned memory, managed memory">
memory, pinned memory, managed memory, AMD, ROCm">
</head>
# GPU memory
For the HIP reference documentation, see:
* {doc}`hip:.doxygen/docBin/html/group___memory`
* {doc}`hip:.doxygen/docBin/html/group___memory_m`
* {doc}`hip:doxygen/html/group___memory`
* {doc}`hip:doxygen/html/group___memory_m`
Host memory exists on the host (e.g. CPU) of the machine in random access memory (RAM).
@@ -177,8 +177,8 @@ Fine-grained memory implies that up-to-date data may be made visible to others r
| API | Flag | Coherence |
|-------------------------|------------------------------|----------------|
| `hipExtMallocWithFlags` | `hipHostMallocDefault` | Fine-grained |
| `hipExtMallocWithFlags` | `hipDeviceMallocFinegrained` | Coarse-grained |
| `hipExtMallocWithFlags` | `hipDeviceMallocDefault` | Coarse-grained |
| `hipExtMallocWithFlags` | `hipDeviceMallocFinegrained` | Fine-grained |
| API | `hipMemAdvise` argument | Coherence |
|-------------------------|------------------------------|----------------|

View File

@@ -0,0 +1,47 @@
.. meta::
:description: Setting the number of CUs
:keywords: AMD, ROCm, cu, number of cus
.. _env-variables-reference:
*************************************************************
Setting the number of CUs
*************************************************************
When using GPUs to accelerate compute workloads, it sometimes becomes necessary
to configure the usage of Compute Units (CU) of the hardware. This is a more advanced
option, so please read this page before experimentation.
The GPU driver provides two environment variables to set the number of CUs used. The
first one is ``HSA_CU_MASK`` and the second one is ``ROC_GLOBAL_CU_MASK``. The main
difference is, that ``ROC_GLOBAL_CU_MASK`` sets the CU mask on queues created by
the HIP or the OpenCL runtimes. While ``HSA_CU_MASK`` sets the mask on a lower level of
queue creation in the driver, this mask will also be set for queues being profiled.
The environment variables have the following syntax:
::
ID = [0-9][0-9]* ex. base 10 numbers
ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7
GPU_list = ID_list ex. 0,2-4,7
CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7
CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. For GPUs listed,
the listed or masked CUs will be enabled, the rest disabled. Unlisted GPUs will not
be affected, their CUs will all be enabled.
The parsing of the variable is stopped when a syntax error occurs. The erroneous set
and the ones following will be ignored. Repeating GPU or CU IDs are a syntax error.
Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. For excluding
GPU devices use ``ROCR_VISIBLE_DEVICES``.
These environment variables only affect ROCm software, not graphics applications.
It's important to know that not all CU configurations are valid on all devices. For
instance, on devices where two CUs can be combined into a WGP (for kernels running in
WGP mode), it is not valid to disable only a single CU in a WGP. `This paper
<https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ can provide more information
about what to expect, when disabling CUs.

View File

@@ -2,19 +2,18 @@
<meta charset="UTF-8">
<meta name="description" content="Using the LLVM ASan on a GPU">
<meta name="keywords" content="LLVM, ASan, address sanitizer, AddressSanitizer, instrumented
libraries, instrumented applications">
libraries, instrumented applications, AMD, ROCm">
</head>
# Using the LLVM ASan on a GPU (beta release)
The LLVM AddressSanitizer (ASan) provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications exactly like pure CPU applications. However, this simplicity has not been achieved yet.
This document provides documentation on using ROCm ASan.
For information about LLVM ASan, see the [LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
**Note**: The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
**Note:** The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
## Compiling for ASan
@@ -24,13 +23,20 @@ Recommendations for doing this are:
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
* Add the following options to the existing compiler and linker options:
* `-fsanitize=address` - enables instrumentation
* `-shared-libsan` - use shared version of runtime
* `-g` - add debug info for improved reporting
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
**Note:** It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
**Note:** When compiling OpenMP programs with ASan instrumentation, it is currently necessary to set the environment variable `LIBRARY_PATH` to `/opt/rocm-<version>/lib/llvm/lib/asan:/opt/rocm-<version>/lib/asan`. At runtime, it may be necessary to add `/opt/rocm-<version>/lib/llvm/lib/asan` to `LD_LIBRARY_PATH`.
### About compilation time
@@ -54,7 +60,7 @@ For a complete ROCm GPU Sanitizer installation, including packages, instrumented
## Using AMD-supplied ASan instrumented libraries
ROCm releases have optional packages that contain additional ASan instrumented builds of the ROCm libraries (usually found in `/opt/rocm-<version>/lib`). The instrumented libraries have identical names to the regular uninstrumented libraries, and are located in `/opt/rocm-<version>/lib/asan`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with `g++`. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
@@ -90,9 +96,10 @@ There are two `ASAN_OPTION` flags of particular note.
* `halt_on_error=0/1 default 1`.
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
This tells the ASan runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
* `detect_leaks=0/1 default 1`.
This option directs the ASan runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
## Runtime overhead
@@ -233,9 +240,167 @@ $ rocgdb <path to application>
### Using ASan with a short HIP application
Refer to the following example to use ASan with a short HIP application,
Consider the following simple and short demo of using the Address Sanitizer with a HIP application:
https://github.com/Rmalavally/rocm-examples/blob/Rmalavally-patch-1/LLVM_ASAN/Using-Address-Sanitizer-with-a-Short-HIP-Application.md
```C++
#include <cstdlib>
#include <hip/hip_runtime.h>
__global__ void
set1(int *p)
{
int i = blockDim.x*blockIdx.x + threadIdx.x;
p[i] = 1;
}
int
main(int argc, char **argv)
{
int m = std::atoi(argv[1]);
int n1 = std::atoi(argv[2]);
int n2 = std::atoi(argv[3]);
int c = std::atoi(argv[4]);
int *dp;
hipMalloc(&dp, m*sizeof(int));
hipLaunchKernelGGL(set1, dim3(n1), dim3(n2), 0, 0, dp);
int *hp = (int*)malloc(c * sizeof(int));
hipMemcpy(hp, dp, m*sizeof(int), hipMemcpyDeviceToHost);
hipDeviceSynchronize();
hipFree(dp);
free(hp);
std::puts("Done.");
return 0;
}
```
This application will attempt to access invalid addresses for certain command line arguments. In particular, if `m < n1 * n2` some device threads will attempt to access
unallocated device memory.
Or, if `c < m`, the `hipMemcpy` function will copy past the end of the `malloc` allocated memory.
**Note**: The `hipcc` compiler is used here for simplicity.
Compiling without XNACK results in a warning.
```bash
$ hipcc -g --offload-arch=gfx90a:xnack- -fsanitize=address -shared-libsan mini.hip -o mini
clang++: warning: ignoring` `-fsanitize=address' option for offload arch 'gfx90a:xnack-`, as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]`.
```
The binary compiled above will run, but the GPU code will not be instrumented and the `m < n1 * n2` error will not be detected. Switching to `--offload-arch=gfx90a:xnack+` in the command above results in a warning-free compilation and an instrumented application. After setting `PATH`, `LD_LIBRARY_PATH` and `HSA_XNACK` as described earlier, a check of the binary with `ldd` yields the following,
```bash
$ ldd mini
linux-vdso.so.1 (0x00007ffd1a5ae000)
libclang_rt.asan-x86_64.so => /opt/rocm-6.1.0-99999/llvm/lib/clang/17.0.0/lib/linux/libclang_rt.asan-x86_64.so (0x00007fb9c14b6000)
libamdhip64.so.5 => /opt/rocm-6.1.0-99999/lib/asan/libamdhip64.so.5 (0x00007fb9bedd3000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb9beba8000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb9bea59000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb9bea3e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb9be84a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb9be844000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9be821000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb9be817000)
libamd_comgr.so.2 => /opt/rocm-6.1.0-99999/lib/asan/libamd_comgr.so.2 (0x00007fb9b4382000)
libhsa-runtime64.so.1 => /opt/rocm-6.1.0-99999/lib/asan/libhsa-runtime64.so.1 (0x00007fb9b3b00000)
libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fb9b3af3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb9c2027000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fb9b3ad7000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fb9b3aa7000)
libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007fb9b3a89000)
libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007fb9b3a70000)
libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007fb9b3a62000)
```
This confirms that the address sanitizer runtime is linked in, and the ASAN instrumented version of the runtime libraries are used.
Checking the `PATH` yields
```bash
$ which llvm-symbolizer
/opt/rocm-6.1.0-99999/llvm/bin/llvm-symbolizer
```
Lastly, a check of the OS kernel version yields
```bash
$ uname -rv
5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023
```
which indicates that the required HMM support (kernel version > 5.6) is available. This completes the necessary setup. Running with `m = 100`, `n1 = 11`, `n2 = 10` and `c = 100` should produce
a report for an invalid access by the last 10 threads.
```bash
=================================================================
==3141==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fb1410d2cc4
WRITE of size 4 in workgroup id (10,0,0)
#0 0x7fb1410d2cc4 in set1(int*) at /home/dave/mini/mini.cpp:0:10
Thread ids and accessed addresses:
00 : 0x7fb14371d190 01 : 0x7fb14371d194 02 : 0x7fb14371d198 03 : 0x7fb14371d19c 04 : 0x7fb14371d1a0 05 : 0x7fb14371d1a4 06 : 0x7fb14371d1a8 07 : 0x7fb14371d1ac
08 : 0x7fb14371d1b0 09 : 0x7fb14371d1b4
0x7fb14371d190 is located 0 bytes after 400-byte region [0x7fb14371d000,0x7fb14371d190)
allocated by thread T0 here:
#0 0x7fb151c76828 in hsa_amd_memory_pool_allocate /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:692:3
#1 ...
#12 0x7fb14fb99ec4 in hipMalloc /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:568:3
#13 0x226630 in hipError_t hipMalloc<int>(int**, unsigned long) /opt/rocm-6.1.0-99999/include/hip/hip_runtime_api.h:8367:12
#14 0x226630 in main /home/dave/mini/mini.cpp:19:5
#15 0x7fb14ef02082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
Shadow bytes around the buggy address:
0x7fb14371cf00: ...
=>0x7fb14371d180: 00 00[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7fb14371d200: ...
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
...
==3141==ABORTING
```
Running with `m = 100`, `n1 = 10`, `n2 = 10` and `c = 99` should produce a report for an invalid copy.
```shell
=================================================================
==2817==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x514000150dcc at pc 0x7f5509551aca bp 0x7ffc90a7ae50 sp 0x7ffc90a7a610
WRITE of size 400 at 0x514000150dcc thread T0
#0 0x7f5509551ac9 in __asan_memcpy /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:61:3
#1 ...
#9 0x7f5507462a28 in hipMemcpy_common(void*, void const*, unsigned long, hipMemcpyKind, ihipStream_t*) /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:637:10
#10 0x7f5507464205 in hipMemcpy /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:642:3
#11 0x226844 in main /home/dave/mini/mini.cpp:22:5
#12 0x7f55067c3082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
#13 0x22605d in _start (/home/dave/mini/mini+0x22605d)
0x514000150dcc is located 0 bytes after 396-byte region [0x514000150c40,0x514000150dcc)
allocated by thread T0 here:
#0 0x7f5509553dcf in malloc /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
#1 0x226817 in main /home/dave/mini/mini.cpp:21:21
#2 0x7f55067c3082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: heap-buffer-overflow /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:61:3 in __asan_memcpy
Shadow bytes around the buggy address:
0x514000150b00: ...
=>0x514000150d80: 00 00 00 00 00 00 00 00 00[04]fa fa fa fa fa fa
0x514000150e00: ...
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
...
==2817==ABORTING
```
### Known issues with using GPU sanitizer

View File

@@ -21,7 +21,6 @@ for template in templates:
with open(os.path.splitext(template)[0], 'w') as file:
file.write(rendered)
shutil.copy2('../CONTRIBUTING.md','./contribute/index.md')
shutil.copy2('../RELEASE.md','./about/release-notes.md')
# Keep capitalization due to similar linking on GitHub's markdown preview.
shutil.copy2('../CHANGELOG.md','./about/CHANGELOG.md')
@@ -38,9 +37,9 @@ latex_elements = {
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023-2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.0.0"
release = "6.0.0"
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.0.1"
release = "6.0.1"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
@@ -48,9 +47,14 @@ all_article_info_author = ""
# pages with specific settings
article_pages = [
{
"file":"release",
"file":"about/release-notes",
"os":["linux", "windows"],
"date":"2023-07-27"
"date":"2024-01-31"
},
{
"file":"about/CHANGELOG",
"os":["linux", "windows"],
"date":"2024-01-31"
},
{"file":"install/windows/install-quick", "os":["windows"]},
@@ -70,9 +74,6 @@ article_pages = [
{"file":"install/windows/cli/index", "os":["windows"]},
{"file":"install/windows/gui/index", "os":["windows"]},
{"file":"about/compatibility/linux-support", "os":["linux"]},
{"file":"about/compatibility/windows-support", "os":["windows"]},
{"file":"about/compatibility/docker-image-support-matrix", "os":["linux"]},
{"file":"about/compatibility/user-kernel-space-compat-matrix", "os":["linux"]},
@@ -84,15 +85,13 @@ article_pages = [
{"file":"how-to/tuning-guides", "os":["linux", "windows"]},
{"file":"rocm-a-z", "os":["linux", "windows"]},
{"file":"about/release-notes", "os":["linux"]},
]
exclude_patterns = ['temp']
external_toc_path = "./sphinx/_toc.yml"
extensions = ["rocm_docs"]
extensions = ["rocm_docs", "sphinx_reredirects"]
external_projects_current_project = "rocm"
@@ -104,3 +103,7 @@ html_title = "ROCm Documentation"
html_theme_options = {
"link_main_doc": False
}
redirects = {
"reference/openmp/openmp": "../../about/compatibility/openmp.html"
}

View File

@@ -1,7 +1,8 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Building ROCm documentation">
<meta name="keywords" content="documentation, Visual Studio Code, GitHub, command line">
<meta name="keywords" content="documentation, Visual Studio Code, GitHub, command line,
AMD, ROCm">
</head>
# Building documentation
@@ -30,11 +31,6 @@ Use the Python Virtual Environment (`venv`) and run the following commands from
```sh
python3 -mvenv .venv
# Windows
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
# Linux
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
```
@@ -128,12 +124,12 @@ documentation locally using Visual Studio (VS) Code. Follow these steps to confi
}
```
> (Implementation detail: two problem matchers were needed to be defined,
> Implementation detail: two problem matchers were needed to be defined,
> because VS Code doesn't tolerate some problem information being potentially
> absent. While a single regex could match all types of errors, if a capture
> group remains empty (the line number doesn't show up in all warning/error
> messages) but the `pattern` references said empty capture group, VS Code
> discards the message completely.)
> discards the message completely.
4. Configure the Python virtual environment (`venv`).

View File

@@ -0,0 +1,112 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Contributing to ROCm">
<meta name="keywords" content="ROCm, contributing, contribute, maintainer, contributor">
</head>
# Contribute to ROCm documentation
All ROCm projects are GitHub-based, so if you want to contribute, you can do so by:
* [Submitting a pull request in the appropriate GitHub repository](#submit-a-pull-request)
* [Creating an issue in the appropriate GitHub repository](#create-an-issue)
* [Suggesting a new feature](#suggest-a-new-feature)
```{important}
By creating a pull request (PR), you agree to allow your contribution to be licensed under the terms of the
LICENSE.txt file in the corresponding repository. Different repositories may use different licenses.
```
## Submit a pull request
To make edits to our documentation via PR, follow these steps:
1. Identify the repository and the file you want to update. For example, to update this page, you would
need to modify content located in this file:
`https://github.com/ROCm/ROCm/blob/develop/docs/contribute/contributing.md`
2. (optional, but recommended) Fork the repository.
3. Clone the repository locally and (optionally) add your fork. Select the green 'Code' button and copy
the URL (e.g., `git@github.com:ROCm/ROCm.git`).
* From your terminal, run:
```bash
git clone git@github.com:ROCm/ROCm.git
```
* Optionally add your fork to this local copy of the repository by running:
```bash
git add remote <name-of-my-fork> <git@github.com:my-username/ROCm.git>
```
To get the URL of your fork, go to your GitHub profile, select the fork and click the green 'Code'
button (the same process you followed to get the main GitHub repository URL).
4. Change directory into your local copy of the repository, and run ``git pull`` (or ``git pull origin develop``) to ensure your local copy has the most recent content.
5. Create and checkout a new branch using the following command:
```bash
git checkout -b <branch_name>
```
6. Change directory into the `./docs` folder and make any documentation changes locally using your preferred code editor. Follow the guidelines listed on the
[documentation structure](./doc-structure.md) page.
7. Optionally run a local test build of the documentation to ensure the content builds and looks as expected. In your terminal, run the following commands from within the `./docs` folder of your cloned repository:
```bash
pip3 install -r sphinx/requirements.txt # You only need to run this command once
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
The build output files are located in the `docs/_build` folder. To preview your build, open the index file
(`docs/_build/html/index.html`) file. For more information, see [Building documentation](building.md). To learn
more about our build tools, see [Documentation toolchain](toolchain.md).
8. Commit your changes and push them to GitHub by running:
```bash
git add <path-to-my-modified-file> # To add all modified files, you can use: git add .
git commit -m "my-updates"
git push <name-of-my-fork>
```
After pushing, you will get a GitHub link in the terminal output. Copy this link and paste it into a
browser to create your PR.
## Create an issue
1. To create a new GitHub issue, select the 'Issues' tab in the appropriate repository
(e.g., https://github.com/ROCm/ROCm/issues).
2. Use the search bar to make sure the issue doesn't already exist.
3. If your issue is not already listed, select the green 'New issue' button to the right of the page. Select
the type of issue and fill in the resulting template.
### General issue guidelines
* Use your best judgement for issue creation. If your issue is already listed, upvote the issue and
comment or post to provide additional details, such as how you reproduced this issue.
* If you're not sure if your issue is the same, err on the side of caution and file your issue.
You can add a comment to include the issue number (and link) for the similar issue. If we evaluate
your issue as being the same as the existing issue, we'll close the duplicate.
* If your issue doesn't exist, use the issue template to file a new issue.
* When filing an issue, be sure to provide as much information as possible, including script output so
we can collect information about your configuration. This helps reduce the time required to
reproduce your issue.
* Check your issue regularly, as we may require additional information to successfully reproduce the
issue.
## Suggest a new feature
Use the [GitHub Discussion forum](https://github.com/ROCm/ROCm/discussions)
(Ideas category) to propose new features. Our maintainers are happy to provide direction and
feedback on feature development.
## Future development workflow
The current ROCm development workflow is GitHub-based. If, in the future, we change this platform,
the tools and links may change. In this instance, we will update contribution guidelines accordingly.

View File

@@ -0,0 +1,219 @@
# Documentation structure
Our documentation follows the Pitchfork folder structure. Most documentation files are stored in the
`/docs` folder. Some special files (such as release, contributing, and changelog) are stored in the root
(`/`) folder.
All images are stored in the `/docs/data` folder. An image's file path mirrors that of the documentation
file where it is used.
Our naming structure uses kebab case; for example, `my-file-name.rst`.
## Supported formats and syntax
Our documentation includes both Markdown and RST files. We are gradually transitioning existing
Markdown to RST in order to more effectively meet our documentation needs. When contributing,
RST is preferred; if you must use Markdown, use GitHub-flavored Markdown.
We use [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) syntax and compile
our API references using [Doxygen](https://www.doxygen.nl/).
The following table shows some common documentation components and the syntax convention we
use for each:
<table>
<tr>
<th>Component</th>
<th>RST syntax</th>
</tr>
<tr>
<td>Code blocks</td>
<td>
```rst
.. code-block:: language-name
My code block.
```
</td>
</tr>
<tr>
<td>Cross-referencing internal files</td>
<td>
```rst
:doc:`Title <../path/to/file/filename>`
```
</td>
</tr>
<tr>
<td>External links</td>
<td>
```rst
`link name <URL>`_
```
</td>
</tr>
<tr>
<tr>
<td>Headings</td>
<td>
```rst
******************
Chapter title (H1)
******************
Section title (H2)
===============
Subsection title (H3)
---------------------
Sub-subsection title (H4)
^^^^^^^^^^^^^^^^^^^^
```
</td>
</tr>
<tr>
<td>Images</td>
<td>
```rst
.. image:: image1.png
```
</td>
</tr>
<tr>
<td>Internal links</td>
<td>
```rst
1. Add a tag to the section you want to reference:
.. _my-section-tag: section-1
Section 1
==========
2. Link to your tag:
As shown in :ref:`section-1`.
```
</td>
</tr>
<tr>
<tr>
<td>Lists</td>
<td>
```rst
# Ordered (numbered) list item
* Unordered (bulleted) list item
```
</td>
</tr>
<tr>
<tr>
<td>Math (block)</td>
<td>
```rst
.. math::
A = \begin{pmatrix}
0.0 & 1.0 & 1.0 & 3.0 \\
4.0 & 5.0 & 6.0 & 7.0 \\
\end{pmatrix}
```
</td>
</tr>
<tr>
<td>Math (inline)</td>
<td>
```rst
:math:`2 \times 2 `
```
</td>
</tr>
<tr>
<td>Notes</td>
<td>
```rst
.. note::
My note here.
```
</td>
</tr>
<tr>
<td>Tables</td>
<td>
```rst
.. csv-table:: Optional title here
:widths: 30, 70 #optional column widths
:header: "entry1 header", "entry2 header"
"entry1", "entry2"
```
</td>
</tr>
</table>
## Language and style
We use the
[Google developer documentation style guide](https://developers.google.com/style/highlights) to
guide our content.
Font size and type, page layout, white space control, and other formatting
details are controlled via
[rocm-docs-core](https://github.com/ROCm/rocm-docs-core). If you want to notify us
of any formatting issues, create a pull request in our
[rocm-docs-core](https://github.com/ROCm/rocm-docs-core) GitHub repository.
## Building our documentation
<!-- % TODO: Fix the link to be able to work at every files -->
To learn how to build our documentation, refer to
[Building documentation](./building.md).

View File

@@ -1,12 +1,12 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Providing feedback for ROCm documentation">
<meta name="keywords" content="documentation, pull request, GitHub">
<meta name="keywords" content="documentation, pull request, GitHub, AMD, ROCm">
</head>
# Providing feedback for ROCm documentation
# Providing feedback
There are four standard ways to provide feedback for this repository.
There are four standard ways to provide feedback on this repository.
## Pull request
@@ -15,18 +15,21 @@ All contributions to ROCm documentation should arrive via the
targeting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us at [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
For more in-depth information on creating a pull request (PR), see
[Contributing](./contributing.md).
## GitHub discussions
To ask questions or view answers to frequently asked questions, refer to
[GitHub Discussions](https://github.com/RadeonOpenCompute/ROCm/discussions).
[GitHub Discussions](https://github.com/ROCm/ROCm/discussions).
On GitHub Discussions, in addition to asking and answering questions,
members can share updates, have open-ended conversations,
and follow along on via public announcements.
## GitHub issue
Issues on existing or absent docs can be filed as
[GitHub Issues](https://github.com/RadeonOpenCompute/ROCm/issues).
Issues on existing or absent documentation can be filed in
[GitHub Issues](https://github.com/ROCm/ROCm/issues).
## Email

View File

@@ -1,7 +1,7 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm documentation toolchain">
<meta name="keywords" content="documentation, toolchain, Sphinx, Doxygen, MyST">
<meta name="keywords" content="documentation, toolchain, Sphinx, Doxygen, MyST, AMD, ROCm">
</head>
# ROCm documentation toolchain
@@ -10,68 +10,56 @@ Our documentation relies on several open source toolchains and sites.
## `rocm-docs-core`
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD-maintained
project that applies customization for our documentation. This
project is the tool most ROCm repositories use as part of the documentation
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
[rocm-docs-core](https://github.com/ROCm/rocm-docs-core) is an AMD-maintained
project that applies customization for our documentation. This project is the tool most ROCm
repositories use as part of the documentation build. It is also available as a
[pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
See the user and developer guides for rocm-docs-core at
{doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
## Sphinx
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
originally used for Python. It is now widely used in the open source community.
Originally, Sphinx supported reStructuredText (RST) based documentation, but
Markdown support is now available.
ROCm documentation plans to default to Markdown for new projects.
Existing projects using RST are under no obligation to convert to Markdown. New
projects that believe Markdown is not suitable should contact the documentation
team prior to selecting RST.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds
and hosts the HTML documentation generated using Sphinx to our end users.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts
information from inline code.
ROCm projects typically use Doxygen for public API documentation unless the
upstream project uses a different tool.
### Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen
content.
### MyST
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
It is integrated into ROCm documentation by the Sphinx extension [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A cheat sheet that showcases how to use the MyST syntax is available over at
the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator originally used for
Python. It is now widely used in the open source community.
### Sphinx External ToC
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file that specifies the table of contents.
It was selected due to its flexibility that allows scripts to operate on the
YAML file. Please transition to this file for the project's navigation. You can
see the `_toc.yml.in` file in this repository in the `docs/sphinx` folder for an
example.
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html) is a Sphinx
extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file (`_toc.yml.in`) that contains the table of contents.
### Sphinx-book-theme
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme
that defines the base appearance for ROCm documentation.
ROCm documentation applies some customization,
such as a custom header and footer on top of the Sphinx Book Theme.
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme that
defines the base appearance for ROCm documentation. ROCm documentation applies some
customization, such as a custom header and footer on top of the Sphinx Book Theme.
### Sphinx design
### Sphinx Design
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design
functionality.
ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that
adds design functionality. ROCm documentation uses Sphinx Design for grids, cards, and synchronized
tabs.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts information from inline
code. ROCm projects typically use Doxygen for public API documentation (unless the upstream project
uses a different tool).
## Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen content.
## MyST
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended flavor of
Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and
Sphinx. It's integrated into ROCm documentation by the Sphinx extension
[`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A MyST syntax cheat sheet is available on the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html) site.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds and hosts the HTML
documentation generated using Sphinx to our end users.

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

BIN
docs/data/banner-howto.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
docs/data/banner-text.xcf Normal file

Binary file not shown.

BIN
docs/data/banner.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

Before

Width:  |  Height:  |  Size: 88 KiB

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 939 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 537 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 292 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 796 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 310 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 789 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 801 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="Deep learning using ROCm">
<meta name="keywords" content="deep learning, frameworks, installation, PyTorch, TensorFlow,
MAGMA">
MAGMA, AMD, ROCm">
</head>
# Deep learning guide
@@ -11,12 +11,12 @@ The following sections cover the different framework installations for ROCm and
deep-learning applications. The following image provides
the sequential flow for the use of each framework. Refer to the ROCm Compatible
Frameworks Release Notes for each framework's most current release notes at
{doc}`Third-party support<rocm-install-on-linux:3rd-party-support-matrix>`.
{doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
![ROCm Compatible Frameworks Flowchart](../data/install/magma-install/magma005.png "ROCm Compatible Frameworks")
![ROCm Compatible Frameworks Flowchart](../data/how-to/magma005.png "ROCm Compatible Frameworks")
## Frameworks installation
* {doc}`PyTorch for ROCm<rocm-install-on-linux:pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:tensorflow-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:magma-install>`
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`

View File

@@ -1,3 +1,7 @@
.. meta::
:description: GPU-enabled Message Passing Interface
:keywords: Message Passing Interface, MPI, AMD, ROCm
***************************************************************************************************
GPU-enabled Message Passing Interface
***************************************************************************************************
@@ -44,7 +48,7 @@ home directory in our example, but you can specify a different location if you w
mkdir -p $BUILD_DIR
2. Install UCX. To view UCX and ROCm version compatibility, refer to the
`communication libraries tables <https://rocm.docs.amd.com/en/latest/release/3rd_party_support_matrix.html>`_.
`communication libraries tables <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html>`_
.. code-block:: shell
@@ -153,7 +157,7 @@ library with ROCm support.
.. note::
You can verify UCC and ROCm version compatibility using the
`communication libraries <https://rocm.docs.amd.com/en/latest/release/3rd_party_support_matrix.html>`_.
`communication libraries tables <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html>`_
.. code-block:: shell
@@ -256,5 +260,5 @@ To run an OSU benchmark using multiple nodes, use the following code:
.. code-block:: shell
export LD_LIBRARY_PATH=$OMPI_DIR/lib:$OFI_DIR/lib64:/opt/rocm/lib
$OMPI_DIR/bin/mpirun -np 2 \
$OMPI_DIR/bin/mpirun --mca pml ob1 --mca btl_ofi_mode 2 -np 2 \
./c/mpi/pt2pt/standard/osu_bw D D

View File

@@ -1,7 +1,8 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="System debugging guide">
<meta name="keywords" content="debug, system-level debug, debug flags, PCIe debug">
<meta name="keywords" content="debug, system-level debug, debug flags, PCIe debug, AMD,
ROCm">
</head>
# System debugging guide

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="Tuning guides">
<meta name="keywords" content="high-performance computing, HPC, Instinct accelerators,
Radeon, tuning, tuning guide">
Radeon, tuning, tuning guide, AMD, ROCm">
</head>
# Tuning guides

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="MI100 high-performance computing and tuning guide">
<meta name="keywords" content="MI100, high-performance computing, HPC, tuning, BIOS
settings, NBIO">
settings, NBIO, AMD, ROCm">
</head>
# MI100 high-performance computing and tuning guide
@@ -365,10 +365,10 @@ installed.
## System management
For a complete guide on how to install/manage/uninstall ROCm on Linux, refer to
{doc}`Quick-start (Linux)<rocm-install-on-linux:tutorial/quick-start>`. For verifying that the
installation was successful, refer to
{ref}`verifying-kernel-mode-driver-installation` and
[Validation Tools](../../reference/library-index.md). Should verification
{doc}`Quick-start (Linux)<rocm-install-on-linux:tutorial/quick-start>`. To verify that the installation was
successful, refer to the
{doc}`post-install instructions<rocm-install-on-linux:how-to/native-install/post-install>` and
[system tools](../../reference/rocm-tools.md). Should verification
fail, consult the [System Debugging Guide](../system-debugging.md).
(mi100-hw-verification)=
@@ -412,7 +412,8 @@ SIMD pipelines, memory information, and Instruction Set Architecture:
![rocminfo output fragment on an 8*MI100 system](../../data/how-to/tuning-guides/tuning003.png "rocminfo output fragment on an 8*MI100 system")
For a complete list of architecture (LLVM target) names, refer to
[Linux support](../../about/compatibility/linux-support.md) and [Windows support](../../about/compatibility/windows-support.md).
{doc}`Linux<rocm-install-on-linux:reference/system-requirements>` and
{doc}`Windows<rocm-install-on-windows:reference/system-requirements>` support.
### Testing inter-device bandwidth
@@ -454,7 +455,7 @@ sudo zypper install rocm-bandwidth-test
::::
Alternatively, the source code can be downloaded and built from
[source](https://github.com/RadeonOpenCompute/rocm_bandwidth_test).
[source](https://github.com/ROCm/rocm_bandwidth_test).
The output will list the available compute devices (CPUs and GPUs):

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="MI200 high-performance computing and tuning guide">
<meta name="keywords" content="MI200, high-performance computing, HPC, tuning, BIOS
settings, NBIO">
settings, NBIO, AMD, ROCm">
</head>
# MI200 high-performance computing and tuning guide
@@ -34,7 +34,7 @@ Analogous settings for other non-AMI System BIOS providers could be set
similarly. For systems with Intel processors, some settings may not apply or be
available as listed in the following table.
```{list-table} Recommended settings for the system BIOS in a GIGABYTE platform.
```{list-table}
:header-rows: 1
:name: mi200-bios
@@ -351,9 +351,9 @@ installed.
For a complete guide on how to install/manage/uninstall ROCm on Linux, refer to
{doc}`Quick-start (Linux)<rocm-install-on-linux:tutorial/quick-start>`. For verifying that the
installation was successful, refer to
{ref}`verifying-kernel-mode-driver-installation` and
[Validation Tools](../../reference/library-index.md). Should verification
installation was successful, refer to the
{doc}`post-install instructions<rocm-install-on-linux:how-to/native-install/post-install>` and
[system tools](../../reference/rocm-tools.md). Should verification
fail, consult the [System Debugging Guide](../system-debugging.md).
(mi200-hw-verification)=
@@ -397,7 +397,8 @@ Instruction Set Architecture (ISA):
![rocminfo output fragment on an 8*MI200 system](../../data/how-to/tuning-guides/tuning010.png "'rocminfo' output fragment on an 8*MI200 system")
For a complete list of architecture (LLVM target) names, refer to GPU OS Support for
[Linux](../../about/compatibility/linux-support.md) and [Windows](../../about/compatibility/windows-support.md).
{doc}`Linux<rocm-install-on-linux:reference/system-requirements>` and
{doc}`Windows<rocm-install-on-windows:reference/system-requirements>`.
### Testing inter-device bandwidth
@@ -439,7 +440,7 @@ sudo zypper install rocm-bandwidth-test
::::
Alternatively, the source code can be downloaded and built from
[source](https://github.com/RadeonOpenCompute/rocm_bandwidth_test).
[source](https://github.com/ROCm/rocm_bandwidth_test).
The output will list the available compute devices (CPUs and GPUs), including
their device ID and PCIe ID:

View File

@@ -1,7 +1,8 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="RDNA2 workstation tuning guide">
<meta name="keywords" content="RDNA2, workstation tuning, BIOS settings, installation">
<meta name="keywords" content="RDNA2, workstation tuning, BIOS settings, installation, AMD,
ROCm">
</head>
# RDNA2 workstation tuning guide
@@ -11,16 +12,16 @@
This chapter reviews system settings that are required to configure the system
for ROCm virtualization on RDNA2-based AMD Radeon™ PRO GPUs. Installing ROCm on
Bare Metal follows the routine ROCm
{doc}`installation procedure<rocm-install-on-linux:how-to/native-install/install>`.
{doc}`installation procedure<rocm-install-on-linux:how-to/native-install/index>`.
To enable ROCm virtualization on V620, one has to setup Single Root I/O
Virtualization (SR-IOV) in the BIOS via setting found in the following
({ref}`bios-settings`). A tested configuration can be followed in
({ref}`os-settings`).
```{attention}
:::{attention}
SR-IOV is supported on V620 and unsupported on W6800.
```
:::
(bios-settings)=
@@ -121,8 +122,7 @@ sudo reboot
```
Install the GPU-IOV Module (GIM, where IOV is I/O Virtualization) driver and
follow the steps below. To obtain the GIM driver, write to us
[here](mailto:CloudGPUsupport@amd.com):
follow the steps below.z
```shell
sudo dpkg -i <gim_driver>
@@ -166,6 +166,4 @@ First, assign GPU virtual function (VF) to VM using the following steps.
Then start the VM.
Finally install ROCm on the virtual machine (VM). For detailed instructions,
refer to the {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/install>`. For any
issue encountered during installation, write to us
[here](mailto:CloudGPUsupport@amd.com).
refer to the {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`.

View File

@@ -2,7 +2,7 @@
<meta charset="UTF-8">
<meta name="description" content="AMD ROCm documentation">
<meta name="keywords" content="documentation, guides, installation, compatibility, support,
reference">
reference, ROCm, AMD">
</head>
# AMD ROCm™ documentation
@@ -10,23 +10,28 @@
Welcome to the ROCm docs home page! If you're new to ROCm, you can review the following
resources to learn more about our products and what we support:
* [What is ROCm?](./what-is-rocm.md)
* [What is ROCm?](./what-is-rocm.rst)
* [Release notes](./about/release-notes.md)
You can install ROCm on our Radeon™, Radeon™ PRO, and Instinct™ GPUs. If you're using Radeon
GPUs, we recommend reading the
{doc}`Radeon-specific ROCm documentation<radeon:index>`.
For hands-on applications, refer to our [ROCm blogs](https://rocm.blogs.amd.com/) site.
Our documentation is organized into the following categories:
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-installation.jpg
:img-alt: Install documentation
:padding: 2
**Installation**
Installation guides
^^^
* Linux
* {doc}`Quick-start (Linux)<rocm-install-on-linux:tutorial/quick-start>`
* {doc}`Quick start guide<rocm-install-on-linux:tutorial/quick-start>`
* {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`
* {doc}`Package manager integration<rocm-install-on-linux:how-to/native-install/package-manager-integration>`
* Windows
@@ -35,33 +40,54 @@ Installation guides
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
:::
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-compatibility.jpg
:img-alt: Compatibility information
:padding: 2
**Compatibility & support**
ROCm compatibility information
^^^
* [Linux (GPU & OS)](./about/compatibility/linux-support.md)
* [Windows (GPU & OS)](./about/compatibility/windows-support.md)
* {doc}`Third-party<rocm-install-on-linux:reference/3rd-party-support-matrix>`
* {doc}`System requirements (Linux)<rocm-install-on-linux:reference/system-requirements>`
* {doc}`System requirements (Windows)<rocm-install-on-windows:reference/system-requirements>`
* {doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`
* {doc}`User/kernel space<rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`
* {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
* [OpenMP](./about/compatibility/openmp.md)
* [Precision support](./about/compatibility/precision-support.rst)
* {doc}`ROCm on Radeon GPUs<radeon:index>`
:::
<!-- markdownlint-disable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-reference.jpg
:img-alt: Reference documentation
:padding: 2
**How-to**
Task-oriented walkthroughs
^^^
* [API libraries](./reference/api-libraries.md)
* [Artificial intelligence](#artificial-intelligence-apis)
* [C++ primitives](#cpp-primitives)
* [Communication](#communication-libraries)
* [Math](#math-apis)
* [Random number generators](#random-number-apis)
* [HIP runtime](#hip-runtime)
* [Tools](./reference/rocm-tools.md)
* [Development](#development-tools)
* [Performance analysis](#performance-analysis)
* [System](#system-tools)
* [Hardware specifications](./reference/gpu-arch-specs.rst)
:::
<!-- markdownlint-enable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-howto.jpg
:img-alt: How-to documentation
:padding: 2
* [System tuning for various architectures](./how-to/tuning-guides.md)
* [MI100](./how-to/tuning-guides/mi100.md)
@@ -71,48 +97,29 @@ Task-oriented walkthroughs
* [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
* [System level debugging](./how-to/system-debugging.md)
* [GitHub examples](https://github.com/amd/rocm-examples)
:::
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-conceptual.jpg
:img-alt: Conceptual documentation
:padding: 2
**Reference**
Collated information
^^^
* [API Libraries](./reference/library-index.md)
:::
:::{grid-item-card}
:padding: 2
**Conceptual**
Topic overviews & background information
^^^
* [GPU architecture](./conceptual/gpu-arch.md)
* [MI100](./conceptual/gpu-arch/mi100.md)
* [MI200](./conceptual/gpu-arch/mi200-performance-counters.md)
* [MI250](./conceptual/gpu-arch/mi250.md)
* [MI300](./conceptual/gpu-arch/mi300.md)
* [GPU memory](./conceptual/gpu-memory.md)
* [Setting the number of CUs](./conceptual/setting-cus.md)
* [Compiler disambiguation](./conceptual/compiler-disambiguation.md)
* [File structure (Linux FHS)](./conceptual/file-reorg.md)
* [GPU isolation techniques](./conceptual/gpu-isolation.md)
* [LLVN ASan](./conceptual/using-gpu-sanitizer.md)
* [LLVM ASan](./conceptual/using-gpu-sanitizer.md)
* [Using CMake](./conceptual/cmake-packages.rst)
* [ROCm & PCIe atomics](./conceptual/More-about-how-ROCm-uses-PCIe-Atomics.rst)
* [Inception v3 with PyTorch](./conceptual/ai-pytorch-inception.md)
* [Inference optimization with MIGraphX](./conceptual/ai-migraphx-optimization.md)
* [OpenMP support in ROCm](./about/compatibility/openmp.md)
:::
::::
We welcome collaboration! If you'd like to contribute to our documentation, you can find instructions
on our [Contribute to ROCm docs](./contribute/index.md) page. Known issues are listed on
[GitHub](https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue).
Licensing information for all ROCm components is listed on our [Licensing](./about/license.md) page.

View File

@@ -0,0 +1,101 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm API libraries & tools">
<meta name="keywords" content="ROCm, API, libraries, tools, artificial intelligence, development,
Communications, C++ primitives, Fast Fourier transforms, FFTs, random number generators, linear
algebra, AMD">
</head>
# ROCm API libraries
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid
(artificial-intelligence-apis)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-ai.jpg
:img-alt: Artificial intelligence APIs
:padding: 2
* {doc}`Composable Kernel <composable_kernel:index>`
* {doc}`MIGraphX <amdmigraphx:index>`
* {doc}`MIOpen <miopen:index>`
* {doc}`MIVisionX <mivisionx:doxygen/html/index>`
* {doc}`rocAL <rocal:index>`
* {doc}`rocDecode <rocdecode:index>`
* {doc}`ROCm Performance Primitives (RPP) <rpp:index>`
:::
(cpp-primitives)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-cpp-primitives.jpg
:img-alt: C++ primitives
:padding: 2
* {doc}`hipCUB <hipcub:index>`
* {doc}`hipTensor <hiptensor:index>`
* {doc}`rocPRIM <rocprim:index>`
* {doc}`rocThrust <rocthrust:index>`
:::
(communication-libraries)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-communication.jpg
:img-alt: Communication APIs
:padding: 2
* {doc}`RCCL <rccl:index>`
:::
(hip-runtime)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-hip.jpg
:img-alt: HIP APIs
:padding: 2
* {doc}`HIP runtime <hip:index>`
* {doc}`HIPIFY <hipify:index>`
:::
(math-apis)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-math.jpg
:img-alt: Math APIs
:padding: 2
* [half](https://github.com/ROCm/half)
* {doc}`hipBLAS <hipblas:index>` / {doc}`rocBLAS <rocblas:index>`
* {doc}`hipBLASLt <hipblaslt:index>`
* {doc}`hipFFT <hipfft:index>` / {doc}`rocFFT <rocfft:index>`
* {doc}`hipfort <hipfort:index>`
* {doc}`hipSOLVER <hipsolver:index>` / {doc}`rocSOLVER <rocsolver:index>`
* {doc}`hipSPARSE <hipsparse:index>` / {doc}`rocSPARSE <rocsparse:index>`
* {doc}`hipSPARSELt <hipsparselt:index>`
* {doc}`rocALUTION <rocalution:index>`
* {doc}`rocWMMA <rocwmma:index>`
* [Tensile](https://github.com/ROCm/Tensile)
:::
(random-number-apis)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-random-number.jpg
:img-alt: Random number APIs
:padding: 2
* {doc}`hipRAND <hiprand:index>`
* {doc}`rocRAND <rocrand:index>`
:::
::::

View File

@@ -0,0 +1,661 @@
.. meta::
:description: AMD Instinct™ accelerator, AMD Radeon PRO™, and AMD Radeon™ GPU architecture information
:keywords: Instinct, Radeon, accelerator, CDNA, GPU, architecture, VRAM, Compute Units, Cache, Registers, LDS, Register File
Accelerator and GPU hardware specifications
######################################################
The following tables provide an overview of the hardware specifications for AMD Instinct™ accelerators, and AMD Radeon™ PRO and Radeon™ GPUs.
.. tab-set::
.. tab-item:: AMD Instinct accelerators
.. list-table::
:header-rows: 1
:name: instinct-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM (GiB)
- Compute Units
- Wavefront Size
- LDS (KiB)
- L3 Cache (MiB)
- L2 Cache (MiB)
- L1 Vector Cache (KiB)
- L1 Scalar Cache (KiB)
- L1 Instruction Cache (KiB)
- VGPR File (KiB)
- SGPR File (KiB)
*
- MI300X
- CDNA3
- gfx941 or gfx942
- 192
- 304
- 64
- 64
- 256
- 32
- 32
- 16 per 2 CUs
- 64 per 2 CUs
- 512
- 12.5
*
- MI300A
- CDNA3
- gfx940 or gfx942
- 128
- 228
- 64
- 64
- 256
- 24
- 32
- 16 per 2 CUs
- 64 per 2 CUs
- 512
- 12.5
*
- MI250X
- CDNA2
- gfx90a
- 128
- 220 (110 per GCD)
- 64
- 64
-
- 16 (8 per GCD)
- 16
- 16 per 2 CUs
- 32 per 2 CUs
- 512
- 12.5
*
- MI250
- CDNA2
- gfx90a
- 128
- 208
- 64
- 64
-
- 16 (8 per GCD)
- 16
- 16 per 2 CUs
- 32 per 2 CUs
- 512
- 12.5
*
- MI210
- CDNA2
- gfx90a
- 64
- 104
- 64
- 64
-
- 8
- 16
- 16 per 2 CUs
- 32 per 2 CUs
- 512
- 12.5
*
- MI100
- CDNA
- gfx908
- 32
- 120
- 64
- 64
-
- 8
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256 VGPR and 256 AccVGPR
- 12.5
*
- MI60
- GCN5.1
- gfx906
- 32
- 64
- 64
- 64
-
- 4
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
*
- MI50 (32GB)
- GCN5.1
- gfx906
- 32
- 60
- 64
- 64
-
- 4
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
*
- MI50 (16GB)
- GCN5.1
- gfx906
- 16
- 60
- 64
- 64
-
- 4
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
*
- MI25
- GCN5.0
- gfx900
- 16 
- 64
- 64
- 64 
-
- 4 
- 16 
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
*
- MI8
- GCN3.0
- gfx803
- 4
- 64
- 64
- 64
-
- 2
- 16
- 16 per 4 CUs
- 32 per 4 CUs
- 256
- 12.5
*
- MI6
- GCN4.0
- gfx803
- 16
- 36
- 64
- 64
-
- 2
- 16
- 16 per 4 CUs
- 32 per 4 CUs
- 256
- 12.5
.. tab-item:: AMD Radeon PRO GPUs
.. list-table::
:header-rows: 1
:name: radeon-pro-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM (GiB)
- Compute Units
- Wavefront Size
- LDS (KiB)
- Infinity Cache (MiB)
- L2 Cache (MiB)
- Graphics L1 Cache (KiB)
- L0 Vector Cache (KiB)
- L0 Scalar Cache (KiB)
- L0 Instruction Cache (KiB)
- VGPR File (KiB)
- SGPR File (KiB)
*
- Radeon PRO W7900
- RDNA3
- gfx1100
- 48
- 96
- 32
- 128
- 96
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon PRO W7800
- RDNA3
- gfx1100
- 32
- 70
- 32
- 128
- 64
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon PRO W7700
- RDNA3
- gfx1101
- 16
- 48
- 32
- 128
- 64
- 4
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon PRO W6800
- RDNA2
- gfx1030
- 32
- 60
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon PRO W6600
- RDNA2
- gfx1032
- 8
- 28
- 32
- 128
- 32
- 2
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon PRO V620
- RDNA2
- gfx1030
- 32
- 72
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon Pro W5500
- RDNA
- gfx1012
- 8
- 22
- 32
- 128
-
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon Pro VII
- GCN5.1
- gfx906
- 16
- 60
- 64
- 64
-
- 4
-
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
.. tab-item:: AMD Radeon GPUs
.. list-table::
:header-rows: 1
:name: radeon-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM (GiB)
- Compute Units
- Wavefront Size
- LDS (KiB)
- Infinity Cache (MiB)
- L2 Cache (MiB)
- Graphics L1 Cache (KiB)
- L0 Vector Cache (KiB)
- L0 Scalar Cache (KiB)
- L0 Instruction Cache (KiB)
- VGPR File (KiB)
- SGPR File (KiB)
*
- Radeon RX 7900 XTX
- RDNA3
- gfx1100
- 24
- 96
- 32
- 128
- 96
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon RX 7900 XT
- RDNA3
- gfx1100
- 20
- 84
- 32
- 128
- 80
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon RX 7900 GRE
- RDNA3
- gfx1100
- 16
- 80
- 32
- 128
- 64
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon RX 7800 XT
- RDNA3
- gfx1101
- 16
- 60
- 32
- 128
- 64
- 4
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon RX 7700 XT
- RDNA3
- gfx1101
- 12
- 54
- 32
- 128
- 48
- 4
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon RX 7600
- RDNA3
- gfx1102
- 8
- 32
- 32
- 128
- 32
- 2
- 256
- 32
- 16
- 32
- 256
- 20
*
- Radeon RX 6950 XT
- RDNA2
- gfx1030
- 16
- 80
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6900 XT
- RDNA2
- gfx1030
- 16
- 80
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6800 XT
- RDNA2
- gfx1030
- 16
- 72
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6800
- RDNA2
- gfx1030
- 16
- 60
- 32
- 128
- 128
- 4
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6750 XT
- RDNA2
- gfx1031
- 12
- 40
- 32
- 128
- 96
- 3
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6700 XT
- RDNA2
- gfx1031
- 12
- 40
- 32
- 128
- 96
- 3
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6700
- RDNA2
- gfx1031
- 10
- 36
- 32
- 128
- 80
- 3
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6650 XT
- RDNA2
- gfx1032
- 8
- 32
- 32
- 128
- 32
- 2
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6600 XT
- RDNA2
- gfx1032
- 8
- 32
- 32
- 128
- 32
- 2
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon RX 6600
- RDNA2
- gfx1032
- 8
- 28
- 32
- 128
- 32
- 2
- 128
- 16
- 16
- 32
- 256
- 20
*
- Radeon VII
- GCN5.1
- gfx906
- 16
- 60
- 64
- 64 per CU
-
- 4
-
- 16
- 16 per 3 CUs
- 32 per 3 CUs
- 256
- 12.5
For more information on the terms used here, see the :ref:`specific documents and guides <gpu-arch-documentation>`, the :doc:`conceptual overview of the HIP programming model<hip:understand/programming_model>`, or the :doc:`HIP reference guide<hip:reference/programming_model>`.

View File

@@ -1,144 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm API libraries & tools">
<meta name="keywords" content="ROCm, API, libraries, tools, artificial intelligence, development,
Communications, C++ primitives, Fast Fourier transforms, FFTs, random number generators, linear
algebra">
</head>
# ROCm API libraries & tools
::::{grid} 1 3 3 3
:class-container: rocm-doc-grid
:::{grid-item-card}
:padding: 2
**Artificial intelligence**
^^^
* {doc}`Composable Kernel <composable_kernel:index>`
* {doc}`MIOpen <miopen:index>`
* {doc}`MIGraphX <amdmigraphx:index>`
:::
:::{grid-item-card}
:padding: 2
**C++ primitives**
^^^
* {doc}`hipCUB <hipcub:index>`
* {doc}`hipTensor <hiptensor:index>`
* {doc}`rocPRIM <rocprim:index>`
* {doc}`rocThrust <rocthrust:index>`
:::
:::{grid-item-card}
:padding: 2
**Communication**
^^^
* {doc}`RCCL <rccl:index>`
:::
:::{grid-item-card}
:padding: 2
**Development tools**
^^^
* {doc}`ROCdbgapi <rocdbgapi:index>`
* [ROCmCC](./rocmcc.md)
* {doc}`ROCm debugger (ROCgdb) <rocgdb:index>`
* {doc}`ROCTracer <roctracer:index>`
:::
:::{grid-item-card}
:padding: 2
**Fast Fourier transforms (FFTs)**
^^^
* {doc}`hipFFT <hipfft:index>`
* {doc}`rocFFT <rocfft:index>`
:::
:::{grid-item-card}
:padding: 2
**HIP**
^^^
* {doc}`HIP runtime <hip:index>`
* {doc}`HIPIFY <hipify:index>`
:::
:::{grid-item-card}
:padding: 2
**Linear algebra**
^^^
* {doc}`hipBLAS <hipblas:index>`
* {doc}`hipBLASLt <hipblaslt:index>`
* {doc}`hipSOLVER <hipsolver:index>`
* {doc}`hipSPARSE <hipsparse:index>`
* {doc}`hipSPARSELt <hipsparselt:index>`
* {doc}`rocALUTION <rocalution:index>`
* {doc}`rocBLAS <rocblas:index>`
* {doc}`rocSOLVER <rocsolver:index>`
* {doc}`rocSPARSE <rocsparse:index>`
* {doc}`rocWMMA <rocwmma:index>`
:::
:::{grid-item-card}
:padding: 2
**Performance analysis**
^^^
* {doc}`ROCProfiler <rocprofiler:profiler_home_page>`
* {doc}`ROCTracer <roctracer:index>`
:::
:::{grid-item-card}
:padding: 2
**Random number generators**
^^^
* {doc}`hipRAND <hiprand:index>`
* {doc}`rocRAND <rocrand:index>`
:::
:::{grid-item-card}
:padding: 2
**System tools**
^^^
* {doc}`AMD SMI <amdsmi:index>`
* {doc}`ROCm Data Center Tool <rdc:index>`
* {doc}`ROCm SMI <rocm_smi_lib:index>`
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
* {doc}`TransferBench <transferbench:index>`
:::
::::
We welcome collaboration! If you'd like to contribute to our documentation, you can find instructions
on our [Contribute to ROCm docs](../contribute/index.md) page. Known issues are listed on
[GitHub](https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue).
Licensing information for all ROCm components is listed on our [Licensing](../about/license.md) page.

Some files were not shown because too many files have changed in this diff Show More