Compare commits

..

113 Commits

Author SHA1 Message Date
Pratik Basyal
845520ff77 LLVM- project pointed to github repo (#4505)
* LLVM- project pointed to latest docs

* Replaced docs link with github repo link
2025-03-17 15:50:42 -04:00
Pratik Basyal
8bf394e998 Revert "rocAL added to ROCm 6.1.5 (#4497)" (#4501)
This reverts commit d1d35bd2d7.
2025-03-14 10:32:44 -04:00
alexxu-amd
dee5470f15 Remove Quick Start and Radeon from 6.1.5 landing page (#4500)
* bump rocm-docs-core version to 1.18.1

* remove unnecessary entries from landing page

* revert rocm-docs-core version update to see if it undo the mess in index page

* bump rocm-docs-core to 1.18.1 again
2025-03-14 10:31:38 -04:00
Pratik Basyal
6c6bbd1460 link fixed for ROCr-runtime (#4498) 2025-03-13 18:48:30 -04:00
Pratik Basyal
d1d35bd2d7 rocAL added to ROCm 6.1.5 (#4497)
* rocAL added

* github link updated

* Github release tag removed
2025-03-13 18:48:20 -04:00
alexxu-amd
9ae2f510ff Update versions.md 2025-03-13 16:04:46 -04:00
Pratik Basyal
bf2f24581e ROCm 6.1.5 Release notes and compatibility matrix update (#318)
* 6.1.5 changes updated

* Lint error fixed

* Review feedback added

* Version tabel udpated

* OS note updated

* Initial review feedback incorporated

* Quick fixes

* Minor FIx

* Table Zebra pattern fixed

* Table CSS updated

* Minor update

* CentOS 7.9 support removed

* Quick update

* Native package installation added

* Leo's feedback added

* Link reset to 6.1.5 pre GA
2025-03-13 15:34:33 -04:00
Sam Wu
bee91034ef Update documentation requirements 2024-09-16 10:13:17 -08:00
Jeffrey Novotny
f97066f7af Merge pull request #3566 from amd-jnovotny/peak-tflops-typo-docs612
Fix typo for TFLOPs metric in MI250 architecture page: cherry pick to docs/6.1.2
2024-08-12 13:18:12 -04:00
Jeffrey Novotny
60ed13b1b0 Fix typo for TFLOPs metric in MI250 architecture page 2024-08-12 10:18:38 -04:00
Jeffrey Novotny
0af66d73e8 Merge pull request #3530 from amd-jnovotny/update-llama-link-612
Fix link to meta-llama finetuning recipes
2024-08-07 12:42:18 -04:00
Jeffrey Novotny
8c0b2dede9 Fix link to rocr debug agent (#3535) 2024-08-06 16:43:09 -06:00
Jeffrey Novotny
fd4366cdd3 Fix link to meta-llama finetuning recipes 2024-08-06 15:39:57 -04:00
spolifroni-amd
02b8dc3eb3 Cherry picking removal of email feedback into 6.1.2 (#3491)
* removed all references to the feedback email

* making the linter happy
2024-08-02 11:58:48 -06:00
Peter Park
0dcf8be892 Merge pull request #3450 from peterjunpark/docs/6.1.2
Remove unused pages in /how-to
2024-07-23 02:51:48 -04:00
Peter Jun Park
8cf3ff1936 remove unused pages 2024-07-22 18:07:32 -04:00
Peter Park
d1b9a04ee9 Merge pull request #3449 from peterjunpark/docs/6.1.2
Merge remote-tracking branch 'upstream/roc-6.1.x' into docs/6.1.2
2024-07-22 18:00:41 -04:00
Peter Jun Park
2bd30f8b91 Merge remote-tracking branch 'upstream/roc-6.1.x' into HEAD 2024-07-22 17:48:50 -04:00
Sam Wu
a1518ffa94 Merge develop into roc-6.1.x (#3440)
* Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (#3396)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2024.2.2 to 2024.7.4 in /docs/sphinx (#3399)

Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.02.02...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* External CI: build hipBLASLt external dependencies (#3405)

* External CI: Increase composable_kernel pipeline time limit (#3407)

* [Changelog/release notes] Fix and add custom templates for autotag script (#3408)

* Update custom templates

* Add custom templates

* Fix custom template for hipfort

* Fix custom template for hipify

* Fix custom template for rvs

* External CI: Change composable_kernel pipeline to build for specific GPUs with tests and examples (#3412)

* increase task time limit

* test building CK for multiple architectures

* Update composable_kernel.yml

* Update composable_kernel.yml

* gfx90a build

* gfx941;gfx1100;gfx1030 build

* hipTensor gfx941 build

* hipTensor gfx941 build

* reduce CK timeout to 100 minutes

* change all gfx90a targets to gfx942

* Bump sphinx-reredirects from 0.1.4 to 0.1.5 in /docs/sphinx (#3419)

Bumps [sphinx-reredirects](https://github.com/documatt/sphinx-reredirects) from 0.1.4 to 0.1.5.
- [Commits](https://github.com/documatt/sphinx-reredirects/compare/v0.1.4...v0.1.5)

---
updated-dependencies:
- dependency-name: sphinx-reredirects
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Removed TransferBench from the tools list (#3421)

* update AI framework image (#3406)

* update AI framework image

* remove old image

* Update system optimization guides headings (#3422)

* update headings to system optimization

* update index

* conv tuning-guides.md to rst

* shorten system optimization landing page

* update conf.py

update toc order

add space

* Update docs/how-to/tuning-guides.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* update keywords

* update intro

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* External CI: move hipBLASLt build directory to ephemeral storage (#3433)

* build hipblaslt in /mnt instead

* rm checkoutref

* remove debug step

* Update using-gpu-sanitizer.md with new known issues (#3423)

* External CI: move hipBLASLt to new large disk pool

* Remove unused custom template for ck (#3438)

* External CI: ROCm nightly builds (#3435)

* ROCm nightly builds

* remove branch trigger, enable develop

* Remove unused configurations in conf.py (#3444)

* External CI: Switch all pipeline GPU_TARGETS to gfx942 (#3443)

* Switch all pipeline gpu targets to gfx942

* Change more pipelines target to gfx942

* set variables for manual testing

* Switch all pipeline gpu targets to gfx942

* Change more pipelines target to gfx942

* set variables for manual testing

* add test pipeline id

* revert test changes

* correct gpu target name

* remove unused flags; change hipSPARSELt target to be gfx942

* Add MI300X tuning guides (#3448)

* Add MI300X tuning guides

Add mi300x doc (pandoc conversion)

fix headings

add metadata

move images to shared/

move images to shared/

convert tuning-guides.md to rst using pandoc

add mi300x to tuning-guides.rst landing page

update h1s, toc, and landing page

fix spelling

fix fmt

format code blocks

add tensilelite imgs

fix formatting

fix formatting some more

fix formatting

more formatting

spelling

remove --enforce-eager note

satisfy spellcheck linter

more spelling

add fixes from hongxia

fix env var in D5

add fixes to PyTorch inductor section

fix

fix

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update 'torch_compile_debug' suggestion based on Hongxia's feedback

fix PyTorch inductor env vars

minor formatting fixes

Apply suggestions from code review

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update vllm path

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

disable numfig in Sphinx configuration

fix formatting and capitalization

add words to wordlist

update index

update wordlist

update optimizing-triton-kernel

convert cards to table

fix link in index.md

add @lpaoletti's feedback

Add system tuning guide

add images

add system section

add os settings and sys management

remove pcie=noats recommendation

reorg

add blurb to developer section

impr formatting

remove windows os from tuning guides pages in conf.py

add suggestions from review

fix typo and link

remove os windows from relevant pages in conf

mi300x

add suggestions from review

fix toc

fix index links

reorg

update vLLM vars

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

update vLLM vars

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

reorganize

add warnings

add text to system tuning

add filler text on index pages

reorg tuning pages

fix links

fix vars

* rm old pages

fix toc

* add suggestions from review

small change

add more suggestions

rewrite intro

* add 'workload tuning philosophy'

* refactor

* fix broken links

* black format conf.py

* simplify cmd and update doc structure

* add higher-level heading for consistency (mi300x.rst)

* add fixes from review

fix url

add fixes

fix formatting

fix fmt

fix hipBLASLt section

change words

fix tensilelite section

fix

fix

fix fmt

* style guide

* fix some formatting

* satisfy spellcheck linter

* update wordlist

* fix bad conflict resolution

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: danielsu-amd <danielsu@amd.com>
Co-authored-by: alexxu-amd <159800977+alexxu-amd@users.noreply.github.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: b-sumner <brian.sumner@amd.com>
2024-07-22 15:39:48 -06:00
randyh62
f45fdd5d83 Update using-gpu-sanitizer.md with new known issues (#3423) (#3437)
Co-authored-by: b-sumner <brian.sumner@amd.com>
2024-07-18 20:42:36 -07:00
spolifroni-amd
7fb9c6de51 Merge pull request #3424 from spolifroni-amd/sp-cherry-pick-612
Cherry pick into 6.1.2
2024-07-16 16:46:09 -04:00
Peter Park
c77c3fec23 Update system optimization guides headings (#3422)
* update headings to system optimization

* update index

* conv tuning-guides.md to rst

* shorten system optimization landing page

* update conf.py

update toc order

add space

* Update docs/how-to/tuning-guides.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* update keywords

* update intro

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2024-07-16 16:15:17 -04:00
spolifroni-amd
dc1a141468 Removed TransferBench from the tools list (#3421) 2024-07-16 16:15:16 -04:00
Sam Wu
747b672b04 Merge pull request #3394 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-07-03 15:54:52 -06:00
Sam Wu
83c87d2191 Merge pull request #3391 from ROCm/develop
Merge develop into roc-6.1.x
2024-07-03 15:09:25 -06:00
alexxu-amd
1733902f7c External CI: Add symlink to lib/llvm folder for ROCmValidationSuite (#3390)
* add CXX flag

* add CXX flag

* Update ROCmValidationSuite.yml

* Change googletest to libgtest-dev

* Update ROCmValidationSuite.yml

* Update ROCmValidationSuite.yml

* add ROCM_PATH as env var

* add HIP_INC_DIR

* remove manual test variables

* set variables for manual test

* remove CMAKE_CXX_COMPILER flag

* Set link to redirect llvm folder

* correct indentation

* remove manual test variables

* rename task
2024-07-03 17:02:53 -04:00
randyh62
8678e351fa update CLR docs reference (#3389)
* update CLR docs reference

* Apply suggestions from code review

Co-authored-by: Peter Park <peter.park@amd.com>

---------

Co-authored-by: amitkumar-amd <Amit.Kumar6@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
2024-07-03 16:59:26 -04:00
alexxu-amd
14dbb44056 Update ROCmValidationSuite pipeline according to the latest change (#3387)
* add CXX flag

* Change googletest to libgtest-dev

* add ROCM_PATH as env var

* add HIP_INC_DIR
2024-07-03 10:32:11 -04:00
danielsu-amd
813972b62b External CI: Add hipBLAS to MIOpen (#3386) 2024-07-02 15:40:44 -04:00
danielsu-amd
5e64d851cb External CI: Add all current component pipeline IDs (#3385) 2024-07-02 13:51:10 -04:00
BrianHarrisonAMD
05cd12e93d Merge pull request #3384 from ROCm/amd/bharriso/miopenAddhipBLASLt
External CI: updated MIOpen dependencies
2024-07-02 11:34:08 -06:00
dependabot[bot]
8ad4435420 Bump rocm-docs-core from 1.4.0 to 1.4.1 in /docs/sphinx (#3375)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-02 10:44:57 -06:00
Sam Wu
31ffa6428f Merge pull request #3374 from ROCm/roc-6.1.x
Merg roc-6.1.x into docs/6.1.2
2024-07-02 10:44:43 -06:00
James Banks
d275a543cb Update single-gpu-fine-tuning-and-inference.rst with correct --showproductname flag (#3378)
Prior flag of `-showproductname` was not valid
2024-07-02 12:04:29 -04:00
Brian Harrison
2ee59acf20 External CI: updated MIOpen dependencies 2024-07-02 16:02:49 +00:00
randyh62
086104bb9f remove Magma (#3361) (#3381)
* remove Magma

* missed one
2024-07-02 07:08:33 -07:00
Sam Wu
e27e8f4a28 Merge pull request #3370 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-28 15:36:56 -06:00
alexxu-amd
325a2fd54c External CI: Fix a typo from composable_kernel pipeline (#3373)
* add libdrm-dev lib to CK dependency list

* change INSTANCE_ONLY to INSTANCES_ONLY
2024-06-28 15:39:08 -04:00
Peter Park
e19b8ee2eb Merge pull request #3369 from peterjunpark/docs/6.1.2
Add fixes to vLLM install and triton kernel optimization (#3366)
2024-06-27 11:45:47 -07:00
Peter Park
ca33838d0c Add fixes to vLLM install and triton kernel optimization (#3366)
* Add fixes to vLLM install and triton kernel optimization

* Update TGI how-to

remove extra step in TGI
2024-06-27 14:32:45 -04:00
Peter Park
a552f9f6b8 Add fixes to vLLM install and triton kernel optimization (#3366)
* Add fixes to vLLM install and triton kernel optimization

* Update TGI how-to

remove extra step in TGI
2024-06-27 14:28:20 -04:00
Joseph Macaranas
accb1347ea External CI: Add initial support for rocAL (#3365) 2024-06-27 13:58:10 -04:00
alexxu-amd
699b604f00 Add INSTANCE_ONLY cmake flag; change pool to ultra; increase time limit to 3.5hr (#3275) 2024-06-27 10:01:43 -04:00
Sam Wu
ce08245f4c Merge pull request #3362 from peterjunpark/fix/index-styling
Fix card text color in index
2024-06-26 15:43:50 -06:00
Peter Jun Park
5c9d071e85 remove card text styling 2024-06-26 14:12:25 -04:00
randyh62
356ad4ab47 remove Magma (#3361)
* remove Magma

* missed one
2024-06-26 10:00:39 -07:00
Sam Wu
57d59bfcc6 Merge pull request #3358 from samjwu/articleinfo
Remove article info for moved or deleted pages
2024-06-26 09:44:49 -06:00
Sam Wu
791285772d Remove article info for moved or deleted pages 2024-06-25 16:45:42 -06:00
abhimeda
217830fe25 added matrices artifact uploading code from rocSPARSE (#3356) 2024-06-25 15:04:52 -04:00
randyh62
c66ddc55b9 added ROCm Core and AMD SMI (#3348) (#3349)
* added ROCm Core and AMD SMI

* fix URLs
2024-06-21 17:11:16 -07:00
randyh62
f07608bc92 added ROCm Core and AMD SMI (#3348)
* added ROCm Core and AMD SMI

* fix URLs
2024-06-21 16:36:39 -07:00
Peter Park
1281e5b145 Merge pull request #3347 from peterjunpark/docs/6.1.2
reorder toc (#3346)
2024-06-21 16:10:23 -07:00
Peter Park
c706f689a0 reorder toc (#3346) 2024-06-21 18:54:44 -04:00
Peter Park
1435634f5c reorder toc (#3346) 2024-06-21 18:53:55 -04:00
Sam Wu
ee384ba0e0 Merge pull request #3345 from ROCm/dependabot/pip/docs/sphinx/sphinx-reredirects-0.1.4
Bump sphinx-reredirects from 0.1.3 to 0.1.4 in /docs/sphinx
2024-06-21 16:46:24 -06:00
Peter Park
feaacde707 Merge pull request #3344 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-21 15:38:22 -07:00
dependabot[bot]
bb0090882c Bump sphinx-reredirects from 0.1.3 to 0.1.4 in /docs/sphinx
Bumps [sphinx-reredirects](https://github.com/documatt/sphinx-reredirects) from 0.1.3 to 0.1.4.
- [Commits](https://github.com/documatt/sphinx-reredirects/compare/v0.1.3...v0.1.4)

---
updated-dependencies:
- dependency-name: sphinx-reredirects
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-21 22:37:37 +00:00
Sam Wu
965304800d Merge pull request #3343 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-21 14:47:57 -06:00
Peter Park
22e9f6f373 Add "Using ROCm for HPC" guide (#3302)
* Add ROCm for HPC

* Update index and toc

* Add TMs in other tutorials

* Add hpc apps table

Spellcheck

add stack image and fix links

Add descriptions

update copy

Update copy

add ref

Finish adding app descriptions

tweak descs

fix line lengths

* Revert "Add TMs in other tutorials"

This reverts commit 08a1a80e57.

* Add links to install and compat matrix

* Update HPC stack graphic and add some links

Add hpc and td to wordlist

fix links

* Apply suggestions from Leo's review

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update docs/how-to/rocm-for-hpc/index.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update docs/how-to/rocm-for-hpc/index.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update docs/how-to/rocm-for-hpc/index.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update docs/how-to/rocm-for-hpc/index.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update docs/how-to/rocm-for-hpc/index.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

fix formatting

Update words

* update wordlist

* Update hpc app descriptions with content from InfinityHub catalog
2024-06-21 16:15:18 -04:00
randyh62
35f6429d1a license information updated (#3339) (#3340)
* license information updated

* Young's comments

* Sam's comment
2024-06-21 09:45:00 -07:00
randyh62
d994302df7 license information updated (#3339)
* license information updated

* Young's comments

* Sam's comment
2024-06-21 09:22:57 -07:00
Peter Park
01bcf5e82b Merge pull request #3336 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-19 12:52:35 -07:00
Sam Wu
9c9d678edf Merge pull request #3333 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-19 13:22:30 -06:00
Peter Park
9d4eb5eff2 Add RHEL 9.4 to compat matrix (#3332)
* Add RHEL 9.4 to compat matrix

* add rhel 9.4 footnote in compat matrix
2024-06-19 15:03:29 -04:00
danielsu-amd
8b95ab0a02 External CI: remove redundant rocm-examples build flags (#3331) 2024-06-19 13:08:31 -04:00
danielsu-amd
e74245fbe4 External CI: Latest source pipeline for rocm-examples (#3317) 2024-06-19 09:59:02 -04:00
Peter Park
8c0ecf7dfd Merge pull request #3330 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-18 19:22:15 -07:00
randyh62
500c455094 remove nvcc (#3313) (#3320)
* remove nvcc

* Update CHANGELOG to match 6.0.0 template

---------

Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-06-18 17:25:51 -07:00
Sam Wu
2e3445a3d4 Merge pull request #3326 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-18 16:34:12 -06:00
Peter Park
778c8e2c05 Add Oracle Linux 8.9 to 6.1.1 changelog (#3327) 2024-06-18 18:29:09 -04:00
Peter Park
361983fa48 Add OL support note to compat matrix (#3325)
Fix footnote

Footnote order

Satisfy spellcheck
2024-06-18 17:32:07 -04:00
Sam Wu
3dff636d40 Merge pull request #3314 from ROCm/dependabot/pip/docs/sphinx/urllib3-2.2.2
Bump urllib3 from 2.2.1 to 2.2.2 in /docs/sphinx
2024-06-18 14:52:26 -06:00
Peter Park
e34d49bea5 Merge pull request #3319 from peterjunpark/docs/6.1.2
Add Radeon PRO dual slot to hw specs (#3318)
2024-06-18 12:34:34 -07:00
Peter Park
40674aac9c Add Radeon PRO dual slot to hw specs (#3318) 2024-06-18 15:28:30 -04:00
Peter Park
1d976a1871 Add Radeon PRO dual slot to hw specs (#3318) 2024-06-18 15:22:43 -04:00
randyh62
ebfec1b7c1 remove nvcc (#3313)
* remove nvcc

* Update CHANGELOG to match 6.0.0 template

---------

Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-06-18 12:11:40 -07:00
Sam Wu
4edabd5d69 Merge pull request #3312 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-18 12:03:22 -06:00
Sam Wu
7838d13848 Resolve conflict with develop 2024-06-18 12:02:43 -06:00
dependabot[bot]
66b71ba3c8 Bump urllib3 from 2.2.1 to 2.2.2 in /docs/sphinx
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.1 to 2.2.2.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-17 23:41:54 +00:00
Young Hui - AMD
9e070dca53 Moving rocm-build to the tools folder (#3285) (#3311)
[Why]
To maintain the "pitchfork layout" convention used by the repository.

[How]
- Update README.md
- Update INFRA_REPO in ROCm.mk
   - Updated to new path: ROCm/tools/rocm-build

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
2024-06-17 18:57:50 -04:00
Joseph Macaranas
e903ffa952 External CI: Update aqlprofile binary used for rocprofiler (#3304) 2024-06-17 14:23:36 -04:00
Peter Park
3c6b9df117 Merge pull request #3310 from peterjunpark/docs/6.1.2
Update link to ROCr Debug Agent to docs portal (#3303)
2024-06-17 10:34:30 -07:00
Peter Park
e6b9ed6dca Update link to ROCr Debug Agent to docs portal (#3303)
* Fix link to debug agent in what-is-rocm

* ROCm --> ROCR

add index

* ROCR --> ROCr

* Change ROCm Debug Agent to ROCr Debug Agent in docs
2024-06-17 11:53:47 -04:00
Peter Park
fe1c2e9529 Update link to ROCr Debug Agent to docs portal (#3303)
* Fix link to debug agent in what-is-rocm

* ROCm --> ROCR

add index

* ROCR --> ROCr

* Change ROCm Debug Agent to ROCr Debug Agent in docs
2024-06-14 17:52:49 -04:00
srawat
f3d6e6b561 Merge pull request #3294 from SwRaw/SR_6.1.2
Update link to command-line argument reference (#3270)
2024-06-13 22:28:04 +05:30
Sam Wu
8e701689d2 Merge pull request #3267 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-13 10:05:13 -06:00
Joseph Macaranas
923141f300 External CI: Fixes for two repos to work with latest source (#3293)
With MIOpen now building with latest source on External CI, this unblocked AMDMIGraphX from building with latest source.

Determined rocMLIR also needed to be built with latest source as a dependency.
2024-06-13 11:55:40 -04:00
Jeffrey Novotny
cb3dee5d07 Merge pull request #3296 from amd-jnovotny/port-aomp-fix
Port aomp fix
2024-06-13 11:37:35 -04:00
Jeffrey Novotny
c61662dadc Remove AOMP from compatibility matrix (#3289) 2024-06-13 11:30:42 -04:00
srawat
bbe495867e Update link to command-line argument reference (#3270)
* Added deleted sections to openmp.md and other improvements

* Update openmp.md
2024-06-13 15:31:30 +05:30
David Galiffi
c91e15a580 Moving rocm-build to the tools folder (#3285)
[Why]
To maintain the "pitchfork layout" convention used by the repository.

[How]
- Update README.md
- Update INFRA_REPO in ROCm.mk
   - Updated to new path: ROCm/tools/rocm-build

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2024-06-12 17:12:06 -04:00
Peter Park
d24b3fab61 Fix ExLlama-v2 code snippet (#3281) 2024-06-12 17:03:04 -04:00
Jeffrey Novotny
e864aa50ac Remove AOMP from compatibility matrix (#3289) 2024-06-12 14:17:32 -04:00
randyh62
c08af3190f update quarantine (#3284) 2024-06-12 09:34:49 -07:00
srawat
2531f0aa03 Update link to command-line argument reference (#3270)
* Added deleted sections to openmp.md and other improvements

* Update openmp.md
2024-06-12 11:53:22 -04:00
Istvan Kiss
b69a9c7b97 Update docs/conceptual/setting-cus.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2024-06-12 17:42:18 +02:00
Joseph Macaranas
13e14363cc External CI: updated MIOpen dependencies (#3278) 2024-06-12 11:23:21 -04:00
Joseph Macaranas
664c047311 External CI: Package rocSPARSE matrices for testers to consume (#3276) 2024-06-12 11:22:46 -04:00
Istvan Kiss
78fdcdf48d Update docs/conceptual/setting-cus.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2024-06-12 16:17:42 +02:00
Peter Park
bcae17a4b5 Merge pull request #3283 from peterjunpark/docs/6.1.2
Remove aomp from What is ROCm? page (#3282)
2024-06-11 11:05:41 -07:00
Peter Park
9140ae5bee Remove aomp from What is ROCm? page (#3282) 2024-06-11 11:48:08 -04:00
Sam Wu
2996620a14 Merge branch 'docs/6.1.2' into roc-6.1.x 2024-06-10 10:56:59 -06:00
Wang, Yanyao
ed6ed158ca Update dependencies of hipfort 2024-06-07 17:31:45 -07:00
Sam Wu
8a6530c7ca Merge pull request #3259 from ROCm/develop
Merge develop into roc-6.1.x
2024-06-07 13:42:06 -06:00
Sam Wu
f1a77be03f Update doc reqs 2024-06-06 14:47:27 -06:00
Peter Park
96cdfdb9a4 Merge pull request #3255 from peterjunpark/docs/6.1.2
docs/6.1.2: Update links in compat matrix and what-is-rocm (#3253)
2024-06-06 10:52:23 -07:00
Peter Park
ba5412f3ba Update links in compat matrix and what-is-rocm (#3253)
* Update links in compat matrix and what-is-rocm

* Tensorflow -> TensorFlow

* Remove extra lines

* Revert "Remove extra lines"

This reverts commit 607c4323ac.

ROCm Debug Agent
2024-06-06 13:29:00 -04:00
Young Hui - AMD
1b6d334146 fix links for MIVisionX (#3240) (#3248) 2024-06-05 16:05:03 -04:00
Peter Park
6a84dc5854 Merge pull request #3251 from peterjunpark/docs/6.1.2
docs/6.1.2: Fix first link in compatibility matrix table (#3239)
2024-06-05 13:00:06 -07:00
Sam Wu
3d137314b7 Fix first link in compatibility matrix table (#3239)
* Fix first link in compatibility matrix table

* Revert "Fix first link in compatibility matrix table"

This reverts commit 069c5c116a.

* Remove sticky header and unused css

* Remove container from hardware specs matrix

---------

Co-authored-by: Peter Jun Park <peter.park@amd.com>
2024-06-05 15:52:32 -04:00
Peter Park
49628b60e0 Merge pull request #3243 from peterjunpark/docs/6.1.2
docs/6.1.2: Rename fine-tuning and optimization guide directory and fix index.md …
2024-06-05 08:44:26 -07:00
Peter Park
7c7ede0ac5 Rename fine-tuning and optimization guide directory and fix index.md (#3242)
* Mv fine-tuning and optimization files

* Reorder index.md

* Rename images directory

* Fix internal links
2024-06-05 11:12:35 -04:00
Sam Wu
246f9b05d0 Merge pull request #3233 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-04 15:44:09 -06:00
249 changed files with 4833 additions and 10876 deletions

View File

@@ -84,10 +84,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/clang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/clang
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DGPU_TARGETS=gfx942
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm/llvm;$(Agent.BuildDirectory)/rocm
-DHALF_INCLUDE_DIR=$(Agent.BuildDirectory)/rocm/include
-DMIGRAPHX_USE_COMPOSABLEKERNEL=OFF

View File

@@ -16,12 +16,15 @@ parameters:
- libbz2-dev
- nlohmann-json3-dev
- libgtest-dev
- libdrm-dev
- name: rocmDependencies
type: object
default:
- rocMLIR
- rocRAND
- rocBLAS
- hipBLAS
- hipBLASLt
- half
- composable_kernel
- rocm-cmake
@@ -30,13 +33,14 @@ parameters:
- rocprofiler-register
- clr
- rocminfo
- roctracer
jobs:
- job: MIOpen
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.MEDIUM_BUILD_POOL }}
pool: ${{ variables.LARGE_DISK_BUILD_POOL }}
workspace:
clean: all
steps:

View File

@@ -13,7 +13,7 @@ parameters:
- libyaml-cpp-dev
- libpci-dev
- libpci3
- googletest
- libgtest-dev
- git
- name: rocmDependencies
type: object
@@ -35,6 +35,10 @@ jobs:
- template: /.azuredevops/variables-global.yml
- name: HIP_ROCCLR_HOME
value: $(Build.BinariesDirectory)/rocm
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
- name: HIP_INC_DIR
value: $(Agent.BuildDirectory)/rocm
pool:
vmImage: ${{ variables.BASE_BUILD_POOL }}
workspace:
@@ -59,10 +63,17 @@ jobs:
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: tag-builds
# Set link to redirect llvm folder
- task: Bash@3
displayName: create symlink
inputs:
targetType: inline
script: ln -s $(Agent.BuildDirectory)/rocm/llvm $(Agent.BuildDirectory)/rocm/lib/llvm
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/clang++
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCPACK_PACKAGING_INSTALL_PREFIX=$(Build.BinariesDirectory)
-GNinja

View File

@@ -12,6 +12,7 @@ parameters:
- ninja-build
- git
- python3-pip
- libdrm-dev
- name: rocmDependencies
type: object
default:
@@ -24,10 +25,11 @@ parameters:
jobs:
- job: composable_kernel
timeoutInMinutes: 100
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.MEDIUM_BUILD_POOL }}
pool: ${{ variables.ULTRA_BUILD_POOL }}
workspace:
clean: all
steps:
@@ -57,6 +59,6 @@ jobs:
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DGPU_TARGETS=gfx1030;gfx1100
-DGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -77,7 +77,6 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DHIP_PLATFORM=amd
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_CLIENTS_BENCHMARKS=OFF

View File

@@ -8,12 +8,13 @@ parameters:
- name: aptPackages
type: object
default:
- ninja-build
- python3-venv
- libmsgpack-dev
- gfortran
- git
- python3-pip
- libdrm-dev
- libmsgpack-dev
- ninja-build
- python3-pip
- python3-venv
- name: pipModules
type: object
default:
@@ -21,15 +22,16 @@ parameters:
- name: rocmDependencies
type: object
default:
- llvm-project
- ROCR-Runtime
- clr
- hipBLAS
- llvm-project
- rocminfo
- rocprofiler-register
- hipBLAS
- ROCR-Runtime
jobs:
- job: hipBLASLt
timeoutInMinutes: 100
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -58,7 +60,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
# CI case: download latest default branch build
# CI case: download latest default branch build
- ${{ if eq(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
@@ -72,17 +74,42 @@ jobs:
dependencySource: tag-builds
- script: sudo ln -s $(Agent.BuildDirectory)/rocm /opt/rocm
displayName: ROCm symbolic link
# Build and install gtest, lapack, hipBLAS-common
# $(Pipeline.Workspace)/deps is a temporary folder for the build process
# $(Pipeline.Workspace)/s/deps is part of the hipBLASLt repo
- script: mkdir $(Pipeline.Workspace)/deps
# hipBLASLt already has a CMake script for external deps, so we can just run that
# https://github.com/ROCm/hipBLASLt/blob/develop/deps/CMakeLists.txt
- script: cmake $(Pipeline.Workspace)/s/deps
displayName: Configure hipBLASLt external dependencies
workingDirectory: $(Pipeline.Workspace)/deps
- script: make
displayName: Build hipBLASLt external dependencies
workingDirectory: $(Pipeline.Workspace)/deps
- script: sudo make install
displayName: Install hipBLASLt external dependencies
workingDirectory: $(Pipeline.Workspace)/deps
# Set link to redirect llvm folder
- task: Bash@3
displayName: Symlink to rocm/lib/llvm
inputs:
targetType: inline
script: ln -s $(Agent.BuildDirectory)/rocm/llvm $(Agent.BuildDirectory)/rocm/lib/llvm
- script: sudo chmod 777 /mnt
displayName: 'Set permissions for /mnt'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
cmakeBuildDir: /mnt/build
cmakeSourceDir: $(Pipeline.Workspace)/s
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DAMDGPU_TARGETS=gfx90a
-DAMDGPU_TARGETS=gfx942
-DTensile_LOGIC=
-DTensile_CPU_THREADS=
-DTensile_CODE_OBJECT_VERSION=default
-DTensile_LIBRARY_FORMAT=msgpack
-DCMAKE_PREFIX_PATH="$(Agent.BuildDirectory)/rocm"
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -57,6 +57,6 @@ jobs:
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_TEST=ON
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -66,7 +66,7 @@ jobs:
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DUSE_HIP_CLANG=ON
-DHIP_COMPILER=clang
-DBUILD_CLIENTS_TESTS=ON

View File

@@ -61,6 +61,6 @@ jobs:
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -74,7 +74,6 @@ jobs:
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DBUILD_CLIENTS_TESTS=ON
-DUSE_CUDA=OFF
-GNinja

View File

@@ -65,3 +65,13 @@ jobs:
-DBUILD_CLIENTS_SAMPLES=OFF
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: hipSPARSE
publish: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-prepare-package.yml
parameters:
sourceDir: $(Build.SourcesDirectory)/build/clients
contentsString: matrices/**
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: testMatrices

View File

@@ -75,7 +75,7 @@ jobs:
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DAMDGPU_TARGETS=all
-DAMDGPU_TARGETS=gfx942
-DTensile_LOGIC=
-DTensile_CPU_THREADS=
-DTensile_CODE_OBJECT_VERSION=default

View File

@@ -58,6 +58,6 @@ jobs:
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DHIPTENSOR_BUILD_TESTS=ON
-DAMDGPU_TARGETS=gfx90a
-DAMDGPU_TARGETS=gfx942
multithreadFlag: -- -j32
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -72,6 +72,6 @@ jobs:
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_TESTS=ON
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/rocm/share/rocm/cmake/
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -0,0 +1,138 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: aptPackages
type: object
default:
- python3-pip
- python3-protobuf
- cmake
- ninja-build
- libprotobuf-dev
- libprotoc-dev
- protobuf-compiler
- liblmdb-dev
- pkg-config
- ffmpeg
- libavcodec-dev
- libavformat-dev
- libavutil-dev
- libswscale-dev
- libturbojpeg-dev
- libjpeg-turbo-official=3.0.2-20240124
- libopencv-dev
- name: pipModules
type: object
default:
- numpy
- opencv-python
- torch
- pillow
- name: rocmDependencies
type: object
default:
- rocm-cmake
- llvm-project
- ROCR-Runtime
- clr
- rocDecode
- half
- rpp
- MIVisionX
- aomp
jobs:
- job: rocAL
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: ${{ variables.BASE_BUILD_POOL }}
workspace:
clean: all
steps:
- task: Bash@3
displayName: 'Register libjpeg-turbo packages'
inputs:
targetType: inline
script: |
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget -q -O- https://packagecloud.io/dcommander/libjpeg-turbo/gpgkey | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/libjpeg-turbo.gpg > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/libjpeg-turbo.gpg] https://packagecloud.io/dcommander/libjpeg-turbo/any/ any main" | sudo tee /etc/apt/sources.list.d/libjpeg-turbo.list
sudo apt update
apt-cache show libjpeg-turbo-official | grep Version
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- task: Bash@3
displayName: 'Clone PyBind11'
inputs:
targetType: inline
script: git clone --depth 1 -b v2.11.1 https://github.com/pybind/pybind11
workingDirectory: '$(Build.SourcesDirectory)'
- task: Bash@3
displayName: 'Clone RapidJSON'
inputs:
targetType: inline
script: git clone --depth 1 https://github.com/Tencent/rapidjson.git
workingDirectory: '$(Build.SourcesDirectory)'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: PyBind11
cmakeBuildDir: '$(Build.SourcesDirectory)/pybind11/build'
customInstallPath: false
installEnabled: false
extraBuildFlags: >-
-DDOWNLOAD_CATCH=ON
-DDOWNLOAD_EIGEN=ON
-GNinja
- task: Bash@3
displayName: 'Install PyBind11'
inputs:
targetType: inline
script: sudo cmake --build . --target install
workingDirectory: '$(Build.SourcesDirectory)/pybind11/build'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: RapidJSON
cmakeBuildDir: '$(Build.SourcesDirectory)/rapidjson/build'
customInstallPath: false
installEnabled: false
extraBuildFlags: >-
-GNinja
- task: Bash@3
displayName: 'Install RapidJSON'
inputs:
targetType: inline
script: sudo cmake --build . --target install
workingDirectory: '$(Build.SourcesDirectory)/rapidjson/build'
# CI case: download latest default branch build
- ${{ if eq(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
# manual build case: triggered by ROCm/ROCm repo
- ${{ if ne(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: tag-builds
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;/opt/libjpeg-turbo
-DCMAKE_INSTALL_PREFIX_PYTHON=$Python3_STDARCH
-DCMAKE_BUILD_TYPE=Release
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -67,7 +67,7 @@ jobs:
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/rocm/share/rocm/cmake/
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_CLIENTS_BENCHMARKS=OFF
-DBUILD_CLIENTS_SAMPLES=OFF

View File

@@ -108,7 +108,7 @@ jobs:
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/bin/hipcc
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/bin/hipcc
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DTensile_CODE_OBJECT_VERSION=default
-DTensile_LOGIC=asm_full
-DTensile_SEPARATE_ARCHITECTURES=ON

View File

@@ -64,7 +64,7 @@ jobs:
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DUSE_HIP_CLANG=ON
-DHIP_COMPILER=clang
-DBUILD_CLIENTS_TESTS=ON

View File

@@ -10,6 +10,13 @@ parameters:
default:
- cmake
- ninja-build
- git
- python3-pip
- name: rocmDependencies
type: object
default:
- llvm-project
- rocm-cmake
jobs:
- job: rocMLIR
@@ -17,8 +24,6 @@ jobs:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.MEDIUM_BUILD_POOL }}
container:
image: ${{ variables.DOCKER_IMAGE_NAME }}:${{ variables.LATEST_DOCKER_VERSION }}
workspace:
clean: all
steps:
@@ -29,13 +34,25 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
# CI case: download latest default branch build
- ${{ if eq(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
# manual build case: triggered by ROCm/ROCm repo
- ${{ if ne(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: tag-builds
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/amdclang
-DCMAKE_PREFIX_PATH=/opt/rocm
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/clang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/clang
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_FAT_LIBROCKCOMPILER=1
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -59,7 +59,7 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_BENCHMARK=ON
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DBUILD_TEST=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -59,6 +59,6 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_TEST=ON
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -82,7 +82,7 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Pipeline.Workspace)/deps-install
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_CLIENTS_BENCHMARKS=OFF
-DBUILD_CLIENTS_SAMPLES=OFF

View File

@@ -68,10 +68,20 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DBUILD_CLIENTS_SAMPLES=OFF
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_CLIENTS_BENCHMARKS=OFF
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip;$(Agent.BuildDirectory)/rocm/hip/cmake
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: rocSPARSE
publish: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-prepare-package.yml
parameters:
sourceDir: $(Build.SourcesDirectory)/build/clients
contentsString: matrices/**
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: testMatrices

View File

@@ -60,7 +60,7 @@ jobs:
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-DBUILD_TEST=ON
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -65,7 +65,7 @@ jobs:
-DCMAKE_BUILD_TYPE=Release
-DROCWMMA_BUILD_TESTS=ON
-DROCWMMA_BUILD_SAMPLES=OFF
-DGPU_TARGETS=gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
# gfx1030 not supported in documentation
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -5,6 +5,30 @@ parameters:
- name: checkoutRef
type: string
default: ''
- name: aptPackages
type: object
default:
- libglfw3-dev
- name: rocmDependencies
type: object
default:
- AMDMIGraphX
- clr
- hipBLAS
- hipCUB
- HIPIFY
- hipRAND
- hipSOLVER
- hipSPARSE
- llvm-project
- rocBLAS
- rocPRIM
- rocprofiler-register
- ROCR-Runtime
- rocRAND
- rocSOLVER
- rocSPARSE
- rocThrust
jobs:
- job: rocm_examples
@@ -20,5 +44,28 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
# CI case: download latest default branch build
- ${{ if eq(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
# manual build case: triggered by ROCm/ROCm repo
- ${{ if ne(parameters.checkoutRef, '') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: tag-builds
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
# https://github.com/ROCm/HIP/issues/2203
extraBuildFlags: >-
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_ROOT=$(Agent.BuildDirectory)/rocm
-DCMAKE_HIP_ARCHITECTURES=gfx942
-DCMAKE_EXE_LINKER_FLAGS=-fgpu-rdc
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -47,7 +47,7 @@ jobs:
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: HIP_ROCCLR_HOME
- name: HIP_ROCCLR_HOME
value: $(Agent.BuildDirectory)/rocm
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
@@ -68,7 +68,7 @@ jobs:
displayName: 'Download aqlprofile'
inputs:
targetType: inline
script: wget -nv https://repo.radeon.com/rocm/apt/6.1/pool/main/h/hsa-amd-aqlprofile/hsa-amd-aqlprofile_1.0.0.60100.60100-82~22.04_amd64.deb
script: wget -nv https://repo.radeon.com/rocm/misc/aqlprofile/ubuntu-22.04/hsa-amd-aqlprofile_1.0.0.60200.60200-crdnnh.14213~22.04_amd64.deb
workingDirectory: '$(Pipeline.Workspace)'
- task: Bash@3
displayName: 'Extract aqlprofile'
@@ -76,7 +76,7 @@ jobs:
targetType: inline
script: |
mkdir hsa-amd-aqlprofile
dpkg-deb -R hsa-amd-aqlprofile_1.0.0.60100.60100-82~22.04_amd64.deb hsa-amd-aqlprofile
dpkg-deb -R hsa-amd-aqlprofile_1.0.0.60200.60200-crdnnh.14213~22.04_amd64.deb hsa-amd-aqlprofile
workingDirectory: '$(Pipeline.Workspace)'
- task: Bash@3
displayName: 'Move aqlprofile'
@@ -84,7 +84,7 @@ jobs:
targetType: inline
script: |
mkdir -p $(Agent.BuildDirectory)/rocm
cp -R hsa-amd-aqlprofile/opt/rocm-6.1.0/* $(Agent.BuildDirectory)/rocm
cp -R hsa-amd-aqlprofile/opt/rocm-6.2.0-14213/* $(Agent.BuildDirectory)/rocm
workingDirectory: '$(Pipeline.Workspace)'
# CI case: download latest default branch build
- ${{ if eq(parameters.checkoutRef, '') }}:
@@ -105,5 +105,5 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DENABLE_LDCONFIG=OFF
-DUSE_PROF_API=1
-DGPU_TARGETS=gfx1030;gfx1100
-DGPU_TARGETS=gfx942
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -65,6 +65,6 @@ jobs:
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DGPU_TARGETS=gfx1030;gfx1100
-DGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -60,6 +60,6 @@ jobs:
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DHALF_INCLUDE_DIRS=$(Agent.BuildDirectory)/rocm/include
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=gfx1030;gfx1100
-DAMDGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml

View File

@@ -0,0 +1,115 @@
parameters:
# currently excludes clr and rocm-examples
- name: rocmDependencies
type: object
default:
- AMDMIGraphX
- amdsmi
- aomp-extras
- aomp
- composable_kernel
- half
- HIP
- hipBLAS
- hipBLASLt
- hipCUB
- hipFFT
- hipfort
- HIPIFY
- hipRAND
- hipSOLVER
- hipSPARSE
- hipSPARSELt
- hipTensor
- llvm-project
- MIOpen
- MIVisionX
- rccl
- rdc
- rocAL
- rocALUTION
- rocBLAS
- ROCdbgapi
- rocDecode
- rocFFT
- ROCgdb
- rocm-cmake
- rocm-core
- rocminfo
- rocMLIR
- ROCmValidationSuite
- rocm_bandwidth_test
- rocm_smi_lib
- rocPRIM
- rocprofiler-register
- rocprofiler
- ROCR-Runtime
- rocRAND
- rocr_debug_agent
- rocSOLVER
- rocSPARSE
- ROCT-Thunk-Interface
- rocThrust
- roctracer
- rocWMMA
- rpp
trigger: none
pr: none
schedules:
- cron: '30 7 * * *'
displayName: Nightly build
branches:
include:
- develop
always: true
jobs:
- job: rocm_nightly
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.MEDIUM_BUILD_POOL }}
workspace:
clean: all
steps:
- task: DeleteFiles@1
displayName: 'Cleanup checkout space'
inputs:
SourceFolder: '$(Agent.BuildDirectory)/s'
Contents: '**/*'
- task: DeleteFiles@1
displayName: 'Cleanup Staging Area'
inputs:
SourceFolder: '$(Build.ArtifactStagingDirectory)'
Contents: '/**/*'
RemoveDotFiles: true
- script: sudo chmod 777 /mnt
displayName: 'Set permissions for /mnt'
- script: df -h
displayName: System disk space before ROCm
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
extractToMnt: true
skipLibraryLinking: true
- script: df -h
displayName: System disk space after ROCm
- script: du -sh /mnt/rocm
displayName: Uncompressed ROCm size
- task: ArchiveFiles@2
displayName: Compress rocm-nightly
inputs:
rootFolderOrFile: /mnt/rocm
includeRootFolder: false
archiveType: tar
tarCompression: gz
archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.DefinitionName)_$(Build.BuildNumber)_ubuntu2204.tar.gz
- script: du -sh $(Build.ArtifactStagingDirectory)
displayName: Compressed ROCm size
- task: PublishPipelineArtifact@1
displayName: 'Public ROCm Nightly Artifact'
retryCountOnTaskFailure: 3
inputs:
targetPath: '$(Build.ArtifactStagingDirectory)'

View File

@@ -0,0 +1,29 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: checkoutRef
type: string
default: refs/tags/$(LATEST_RELEASE_TAG)
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
- repository: release_repo
type: github
endpoint: ROCm
name: ROCm/rocAL
ref: ${{ parameters.checkoutRef }}
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_COMPONENT_PATH }}/rocAL.yml
parameters:
checkoutRepo: release_repo
checkoutRef: ${{ parameters.checkoutRef }}

View File

@@ -0,0 +1,29 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: checkoutRef
type: string
default: refs/tags/$(LATEST_RELEASE_TAG)
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
- repository: release_repo
type: github
endpoint: ROCm
name: ROCm/rocm-examples
ref: ${{ parameters.checkoutRef }}
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_COMPONENT_PATH }}/rocm-examples.yml
parameters:
checkoutRepo: release_repo
checkoutRef: ${{ parameters.checkoutRef }}

View File

@@ -9,38 +9,63 @@ parameters:
- name: useDefaultBranch
type: boolean
default: true
- name: extractToMnt
type: boolean
default: false
- name: defaultBranchList
type: object
default:
amdsmi: develop
aomp: aomp-dev
aomp-extras: aomp-dev
AMDMIGraphX: develop
amdsmi: develop
aomp-extras: aomp-dev
aomp: aomp-dev
clr: develop
composable_kernel: develop
half: master
HIP: develop
hipBLAS: develop
hipBLASLt: develop
hipCUB: develop
hipFFT: develop
hipfort: develop
HIPIFY: amd-staging
hipRAND: develop
hipSOLVER: develop
hipSPARSE: develop
hipSPARSELt: develop
hipTensor: develop
llvm-project: amd-staging
MIOpen: develop
MIVisionX: develop
rccl: develop
rdc: develop
rocAL: develop
rocALUTION: develop
rocBLAS: develop
ROCdbgapi : amd-master
rocDecode: develop
rocFFT: develop
rocgdb: amd-staging
rocm-cmake: develop
rocm_smi_lib: develop
rocminfo: master
rocm-core: master
rocm-examples: develop
rocminfo: amd-staging
rocMLIR: develop
ROCmValidationSuite: master
rocm_bandwidth_test: master
rocm_smi_lib: develop
rocPRIM: develop
rocprofiler-register: amd-mainline
rocprofiler: amd-master
ROCR-Runtime: master
rocRAND: develop
rocr_debug_agent: amd-staging
rocSOLVER: develop
rocSPARSE: develop
ROCT-Thunk-Interface: master
rocThrust: develop
roctracer: amd-master
rocWMMA: develop
rpp: master
- name: componentsFailureOkay
type: object
@@ -70,7 +95,10 @@ steps:
displayName: Extract ${{ parameters.componentName }}
inputs:
archiveFilePatterns: '$(Pipeline.Workspace)/d/**/*.tar.gz'
destinationFolder: '$(Agent.BuildDirectory)/rocm'
${{ if parameters.extractToMnt }}:
destinationFolder: '/mnt/rocm'
${{ else }}:
destinationFolder: '$(Agent.BuildDirectory)/rocm'
cleanDestinationFolder: false
overwriteExistingFiles: true
- task: DeleteFiles@1

View File

@@ -11,6 +11,9 @@ parameters:
- name: cmakeBuildDir
type: string
default: 'build'
- name: cmakeSourceDir
type: string
default: '..'
- name: cmakeTarget
type: string
default: 'install'
@@ -20,6 +23,9 @@ parameters:
- name: installDir
type: string
default: '$(Build.BinariesDirectory)'
- name: customInstallPath
type: boolean
default: true
- name: installEnabled
type: boolean
default: true
@@ -31,7 +37,12 @@ steps:
displayName: '${{parameters.componentName }} CMake Flags'
inputs:
workingDirectory: ${{ parameters.cmakeBuildDir }}
cmakeArgs: -DCMAKE_INSTALL_PREFIX=${{ parameters.installDir }} ${{ parameters.extraBuildFlags }} ..
${{ if eq(parameters.customInstallPath, true) }}:
cmakeArgs: -DCMAKE_INSTALL_PREFIX=${{ parameters.installDir }} ${{ parameters.extraBuildFlags }} ${{ parameters.cmakeSourceDir }}
${{ else }}:
cmakeArgs: ${{ parameters.extraBuildFlags }} ..
- script: df -h
displayName: Disk space before build
# equivalent to running make $cmakeTargetDir from $cmakeBuildDir
# i.e., cd $cmakeBuildDir; make $cmakeTargetDir
- task: CMake@1
@@ -40,6 +51,8 @@ steps:
workingDirectory: ${{ parameters.cmakeBuildDir }}
cmakeArgs: '--build ${{ parameters.cmakeTargetDir }} ${{ parameters.multithreadFlag }}'
retryCountOnTaskFailure: 10
- script: df -h
displayName: Disk space after build
# equivalent to running make $cmakeTarget from $cmakeBuildDir
# e.g., make install
- ${{ if eq(parameters.installEnabled, true) }}:

View File

@@ -11,6 +11,9 @@ parameters:
- staging
- tag-builds
- fixed
- name: extractToMnt
type: boolean
default: false
# required values for fixed selection
- name: fixedPipelineIdentifier
type: string
@@ -23,70 +26,112 @@ parameters:
- name: stagingPipelineIdentifiers
type: object
default:
amdsmi: $(amdsmi-pipeline-id)
aomp: $(aomp-pipeline-id)
aomp-extras: $(aomp-extras-pipeline-id)
AMDMIGraphX: $(amdmigraphx-pipeline-id)
amdsmi: $(amdsmi-pipeline-id)
aomp-extras: $(aomp-extras-pipeline-id)
aomp: $(aomp-pipeline-id)
clr: $(clr-pipeline-id)
composable_kernel: $(composable-kernel-pipeline-id)
half: $(half-pipeline-id)
HIP: $(hip-pipeline-id)
hipBLAS: $(hipblas-pipeline-id)
hipBLASLt: $(hipblaslt-pipeline-id)
hipCUB: $(hipcub-pipeline-id)
hipFFT: $(hipfft-pipeline-id)
hipfort: $(hipfort-pipeline-id)
HIPIFY: $(hipify-pipeline-id)
hipRAND: $(hiprand-pipeline-id)
hipSOLVER: $(hipsolver-pipeline-id)
hipSPARSE: $(hipsparse-pipeline-id)
hipSPARSELt: $(hipsparselt-pipeline-id)
hipTensor: $(hiptensor-pipeline-id)
llvm-project: $(llvm-project-pipeline-id)
MIOpen: $(miopen-pipeline-id)
MIVisionX: $(mivisionx-pipeline-id)
rccl: $(rccl-pipeline-id)
rdc: $(rdc-pipeline-id)
rocAL: $(rocal-pipeline-id)
rocALUTION: $(rocalution-pipeline-id)
rocBLAS: $(rocblas-pipeline-id)
ROCdbgapi : $(rocdbgapi-pipeline-id)
rocDecode: $(rocdecode-pipeline-id)
rocFFT: $(rocfft-pipeline-id)
ROCgdb: $(rocgdb-pipeline-id)
rocm-cmake: $(rocm-cmake-pipeline-id)
rocm-core: $(rocm-core-pipeline-id)
rocm_smi_lib: $(rocm-smi-lib-pipeline-id)
rocm-examples: $(rocm-examples-pipeline-id)
rocminfo: $(rocminfo-pipeline-id)
rocMLIR: $(rocmlir-pipeline-id)
ROCmValidationSuite: $(rocmvalidationsuite-pipeline-id)
rocm_bandwidth_test: $(rocm-bandwidth-test-pipeline-id)
rocm_smi_lib: $(rocm-smi-lib-pipeline-id)
rocPRIM: $(rocprim-pipeline-id)
rocprofiler-register: $(rocprofiler-register-pipeline-id)
rocprofiler: $(rocprofiler-pipeline-id)
ROCR-Runtime: $(rocr-runtime-pipeline-id)
rocRAND: $(rocrand-pipeline-id)
rocr_debug_agent: $(rocr-debug-agent-pipeline-id)
rocSOLVER: $(rocsolver-pipeline-id)
rocSPARSE: $(rocsparse-pipeline-id)
ROCT-Thunk-Interface: $(roct-thunk-interface-pipeline-id)
rocThrust: $(rocthrust-pipeline-id)
roctracer: $(roctracer-pipeline-id)
rocWMMA: $(rocwmma-pipeline-id)
rpp: $(rpp-pipeline-id)
- name: taggedPipelineIdentifiers
type: object
default:
amdsmi: $(amdsmi-tagged-pipeline-id)
aomp: $(aomp-tagged-pipeline-id)
aomp-extras: $(aomp-extras-tagged-pipeline-id)
AMDMIGraphX: $(amdmigraphx-tagged-pipeline-id)
amdsmi: $(amdsmi-tagged-pipeline-id)
aomp-extras: $(aomp-extras-tagged-pipeline-id)
aomp: $(aomp-tagged-pipeline-id)
clr: $(clr-tagged-pipeline-id)
composable_kernel: $(composable-kernel-tagged-pipeline-id)
half: $(half-tagged-pipeline-id)
HIP: $(hip-tagged-pipeline-id)
hipBLAS: $(hipblas-tagged-pipeline-id)
hipBLASLt: $(hipblaslt-tagged-pipeline-id)
hipCUB: $(hipcub-tagged-pipeline-id)
hipFFT: $(hipfft-tagged-pipeline-id)
hipfort: $(hipfort-tagged-pipeline-id)
HIPIFY: $(hipify-tagged-pipeline-id)
hipRAND: $(hiprand-tagged-pipeline-id)
hipSOLVER: $(hipsolver-tagged-pipeline-id)
hipSPARSE: $(hipsparse-tagged-pipeline-id)
hipSPARSELt: $(hipsparselt-tagged-pipeline-id)
hipTensor: $(hiptensor-tagged-pipeline-id)
llvm-project: $(llvm-project-tagged-pipeline-id)
MIOpen: $(miopen-tagged-pipeline-id)
MIVisionX: $(mivisionx-tagged-pipeline-id)
rccl: $(rccl-tagged-pipeline-id)
rdc: $(rdc-tagged-pipeline-id)
rocAL: $(rocal-tagged-pipeline-id)
rocALUTION: $(rocalution-tagged-pipeline-id)
rocBLAS: $(rocblas-tagged-pipeline-id)
ROCdbgapi : $(rocdbgapi-tagged-pipeline-id)
rocDecode: $(rocdecode-tagged-pipeline-id)
rocFFT: $(rocfft-tagged-pipeline-id)
ROCgdb: $(rocgdb-tagged-pipeline-id)
rocm-cmake: $(rocm-cmake-tagged-pipeline-id)
rocm-core: $(rocm-core-tagged-pipeline-id)
rocm_smi_lib: $(rocm-smi-lib-tagged-pipeline-id)
rocm-examples: $(rocm-examples-tagged-pipeline-id)
rocminfo: $(rocminfo-tagged-pipeline-id)
rocMLIR: $(rocmlir-tagged-pipeline-id)
ROCmValidationSuite: $(rocmvalidationsuite-tagged-pipeline-id)
rocm_bandwidth_test: $(rocm-bandwidth-test-tagged-pipeline-id)
rocm_smi_lib: $(rocm-smi-lib-tagged-pipeline-id)
rocPRIM: $(rocprim-tagged-pipeline-id)
rocprofiler-register: $(rocprofiler-register-tagged-pipeline-id)
rocprofiler: $(rocprofiler-tagged-pipeline-id)
ROCR-Runtime: $(rocr-runtime-tagged-pipeline-id)
rocRAND: $(rocrand-tagged-pipeline-id)
rocr_debug_agent: $(rocr-debug-agent-tagged-pipeline-id)
rocSOLVER: $(rocsolver-tagged-pipeline-id)
rocSPARSE: $(rocsparse-tagged-pipeline-id)
ROCT-Thunk-Interface: $(roct-thunk-interface-tagged-pipeline-id)
rocThrust: $(rocthrust-tagged-pipeline-id)
roctracer: $(roctracer-tagged-pipeline-id)
rocWMMA: $(rocwmma-tagged-pipeline-id)
rpp: $(rpp-tagged-pipeline-id)
# set to true if you're calling this template file multiple files in same pipeline
# only leave last call false to optimize sequence
@@ -102,31 +147,45 @@ steps:
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.stagingPipelineIdentifiers[dependency] }}
extractToMnt: ${{ parameters.extractToMnt }}
- ${{ if eq(parameters.dependencySource, 'tag-builds') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.taggedPipelineIdentifiers[dependency] }}
extractToMnt: ${{ parameters.extractToMnt }}
# fixed case only accepts one component at a time, so no array input
- ${{ if eq(parameters.dependencySource, 'fixed') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ parameters.fixedComponentName }}
pipelineId: ${{ parameters.fixedPipelineIdentifier }}
extractToMnt: ${{ parameters.extractToMnt }}
- task: Bash@3
displayName: 'list downloaded ROCm files'
inputs:
targetType: inline
script: ls -1R $(Agent.BuildDirectory)/rocm
${{ if eq(parameters.extractToMnt, true) }}:
script: ls -1R /mnt/rocm
${{ else }}:
script: ls -1R $(Agent.BuildDirectory)/rocm
- ${{ if eq(parameters.skipLibraryLinking, false) }}:
- task: Bash@3
displayName: 'link ROCm shared libraries'
inputs:
targetType: inline
# OS ignores if the ROCm lib folder shows up more than once
script: |
echo $(Agent.BuildDirectory)/rocm/lib | sudo tee -a /etc/ld.so.conf
echo $(Agent.BuildDirectory)/rocm/llvm/lib | sudo tee -a /etc/ld.so.conf
sudo cat /etc/ld.so.conf
sudo ldconfig -v
ldconfig -p
${{ if eq(parameters.extractToMnt, true) }}:
script: |
echo /mnt/rocm/lib | sudo tee -a /etc/ld.so.conf
echo /mnt/rocm/llvm/lib | sudo tee -a /etc/ld.so.conf
sudo cat /etc/ld.so.conf
sudo ldconfig -v
ldconfig -p
${{ else }}:
script: |
echo $(Agent.BuildDirectory)/rocm/lib | sudo tee -a /etc/ld.so.conf
echo $(Agent.BuildDirectory)/rocm/llvm/lib | sudo tee -a /etc/ld.so.conf
sudo cat /etc/ld.so.conf
sudo ldconfig -v
ldconfig -p

View File

@@ -21,6 +21,8 @@ variables:
value: rocm-ci_ultra_build_pool
- name: ON_PREM_BUILD_POOL
value: rocm-ci_build_pool
- name: LARGE_DISK_BUILD_POOL
value: rocm-ci_larger_base_disk_pool
- name: LATEST_RELEASE_TAG
value: rocm-6.1.0
- name: DOCKER_IMAGE_NAME

171
.gitmodules vendored
View File

@@ -1,171 +0,0 @@
[submodule "libs/ROCK-Kernel-Driver"]
path = libs/ROCK-Kernel-Driver
url = ../ROCK-Kernel-Driver
[submodule "libs/ROCT-Thunk-Interface"]
path = libs/ROCT-Thunk-Interface
url = ../ROCT-Thunk-Interface
[submodule "libs/ROCR-Runtime"]
path = libs/ROCR-Runtime
url = ../ROCR-Runtime
[submodule "libs/amdsmi"]
path = libs/amdsmi
url = ../amdsmi
[submodule "libs/rocm_smi_lib"]
path = libs/rocm_smi_lib
url = ../rocm_smi_lib
[submodule "libs/rocm-core"]
path = libs/rocm-core
url = ../rocm-core
[submodule "libs/rocm-cmake"]
path = libs/rocm-cmake
url = ../rocm-cmake
[submodule "libs/rocminfo"]
path = libs/rocminfo
url = ../rocminfo
[submodule "libs/rocm_bandwidth_test"]
path = libs/rocm_bandwidth_test
url = ../rocm_bandwidth_test
[submodule "libs/rocprofiler"]
path = libs/rocprofiler
url = ../rocprofiler
[submodule "libs/roctracer"]
path = libs/roctracer
url = ../roctracer
[submodule "libs/clang-ocl"]
path = libs/clang-ocl
url = ../clang-ocl
[submodule "libs/rdc"]
path = libs/rdc
url = ../rdc
[submodule "libs/HIP"]
path = libs/HIP
url = ../HIP
[submodule "libs/HIP-Examples"]
path = libs/HIP-Examples
url = ../HIP-Examples
[submodule "libs/clr"]
path = libs/clr
url = ../clr
[submodule "libs/hipother"]
path = libs/hipother
url = ../hipother
[submodule "libs/HIPIFY"]
path = libs/HIPIFY
url = ../HIPIFY
[submodule "libs/HIPCC"]
path = libs/HIPCC
url = ../HIPCC
[submodule "libs/llvm-project"]
path = libs/llvm-project
url = ../llvm-project
[submodule "libs/ROCm-Device-Libs"]
path = libs/ROCm-Device-Libs
url = ../ROCm-Device-Libs
[submodule "libs/ROCm-CompilerSupport"]
path = libs/ROCm-CompilerSupport
url = ../ROCm-CompilerSupport
[submodule "libs/half"]
path = libs/half
url = ../half
[submodule "libs/ROCgdb"]
path = libs/ROCgdb
url = ../ROCgdb
[submodule "libs/ROCdbgapi"]
path = libs/ROCdbgapi
url = ../ROCdbgapi
[submodule "libs/rocr_debug_agent"]
path = libs/rocr_debug_agent
url = ../rocr_debug_agent
[submodule "libs/rocBLAS"]
path = libs/rocBLAS
url = ../rocBLAS
[submodule "libs/Tensile"]
path = libs/Tensile
url = ../Tensile
[submodule "libs/hipTensor"]
path = libs/hipTensor
url = ../hipTensor
[submodule "libs/hipBLAS"]
path = libs/hipBLAS
url = ../hipBLAS
[submodule "libs/hipBLASLt"]
path = libs/hipBLASLt
url = ../hipBLASLt
[submodule "libs/rocFFT"]
path = libs/rocFFT
url = ../rocFFT
[submodule "libs/hipFFT"]
path = libs/hipFFT
url = ../hipFFT
[submodule "libs/rocRAND"]
path = libs/rocRAND
url = ../rocRAND
[submodule "libs/hipRAND"]
path = libs/hipRAND
url = ../hipRAND
[submodule "libs/rocSPARSE"]
path = libs/rocSPARSE
url = ../rocSPARSE
[submodule "libs/hipSPARSELt"]
path = libs/hipSPARSELt
url = ../hipSPARSELt
[submodule "libs/rocSOLVER"]
path = libs/rocSOLVER
url = ../rocSOLVER
[submodule "libs/hipSOLVER"]
path = libs/hipSOLVER
url = ../hipSOLVER
[submodule "libs/hipSPARSE"]
path = libs/hipSPARSE
url = ../hipSPARSE
[submodule "libs/rocALUTION"]
path = libs/rocALUTION
url = ../rocALUTION
[submodule "libs/rocThrust"]
path = libs/rocThrust
url = ../rocThrust
[submodule "libs/hipCUB"]
path = libs/hipCUB
url = ../hipCUB
[submodule "libs/rocPRIM"]
path = libs/rocPRIM
url = ../rocPRIM
[submodule "libs/rocWMMA"]
path = libs/rocWMMA
url = ../rocWMMA
[submodule "libs/rccl"]
path = libs/rccl
url = ../rccl
[submodule "libs/MIOpen"]
path = libs/MIOpen
url = ../MIOpen
[submodule "libs/composable_kernel"]
path = libs/composable_kernel
url = ../composable_kernel
[submodule "libs/MIVisionX"]
path = libs/MIVisionX
url = ../MIVisionX
[submodule "libs/rpp"]
path = libs/rpp
url = ../rpp
[submodule "libs/hipfort"]
path = libs/hipfort
url = ../hipfort
[submodule "libs/AMDMIGraphX"]
path = libs/AMDMIGraphX
url = ../AMDMIGraphX
[submodule "libs/ROCmValidationSuite"]
path = libs/ROCmValidationSuite
url = ../ROCmValidationSuite
[submodule "libs/openmp-extras/aomp"]
path = libs/openmp-extras/aomp
url = ../aomp
[submodule "libs/openmp-extras/aomp-extras"]
path = libs/openmp-extras/aomp-extras
url = ../aomp-extras
[submodule "libs/openmp-extras/flang"]
path = libs/openmp-extras/flang
url = ../flang
[submodule "libs/rocDecode"]
path = libs/rocDecode
url = ../rocDecode

View File

@@ -3,20 +3,19 @@
version: 2
sphinx:
configuration: docs/conf.py
formats: [htmlzip]
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "gfortran" # For pre-processing fortran sources
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: []

View File

@@ -2,6 +2,7 @@ AAC
ABI
ACE
ACEs
ACS
AccVGPR
AccVGPRs
ALU
@@ -12,6 +13,7 @@ AMDMIGraphX
AMI
AOCC
AOMP
APBDIS
APIC
APIs
APU
@@ -24,11 +26,13 @@ ATI
AddressSanitizer
AlexNet
Arb
BARs
BLAS
BMC
BitCode
Blit
Bluefield
Bootloader
CCD
CDNA
CIFAR
@@ -43,6 +47,7 @@ CPF
CPP
CPU
CPUs
Cron
CSC
CSE
CSV
@@ -62,7 +67,10 @@ CommonMark
Concretized
Conda
ConnectX
DDR
DF
DGEMM
DIMM
DKMS
DL
DMA
@@ -91,7 +99,9 @@ FFmpeg
FHS
FMA
FP
FX
Filesystem
FindDb
Flang
Fortran
Fuyu
@@ -124,6 +134,7 @@ GitHub
Gitpod
HBM
HCA
HGX
HIPCC
HIPExtension
HIPIFY
@@ -133,12 +144,14 @@ HPE
HPL
HSA
HWE
HWS
Haswell
Higgs
Hyperparameters
ICV
IDE
IDEs
IFWI
IMDb
IOMMU
IOP
@@ -148,6 +161,7 @@ IRQ
ISA
ISV
ISVs
ITL
ImageNet
InfiniBand
Inlines
@@ -159,6 +173,7 @@ JSON
Jupyter
KFD
KiB
KV
KVM
Keras
Khronos
@@ -193,6 +208,7 @@ MVFFR
Makefile
Makefiles
Matplotlib
Megatrends
Megatron
Mellanox
Mellanox's
@@ -208,6 +224,7 @@ NIC
NICs
NLI
NLP
NPKit
NPS
NSP
NUMA
@@ -237,18 +254,22 @@ OpenCV
OpenFabrics
OpenGL
OpenMP
OpenMPI
OpenSSL
OpenVX
PCC
PCI
PCIe
PEFT
PIL
PILImage
POR
PRNG
PRs
PaLM
Pageable
PeerDirect
PerfDb
Perfetto
PipelineParallel
PnP
@@ -287,6 +308,7 @@ SBIOS
SCA
SDK
SDMA
SDPA
SDRAM
SENDMSG
SGPR
@@ -308,10 +330,12 @@ SRAMECC
SVD
SWE
SerDes
ShareGPT
Shlens
Skylake
Softmax
Spack
SplitK
Supermicro
Szegedy
TCA
@@ -322,8 +346,12 @@ TCP
TCR
TF
TFLOPS
TP
TPU
TPUs
TSME
Tagram
TensileLite
TensorBoard
TensorFlow
TensorParallel
@@ -344,6 +372,7 @@ USM
UTCL
UTIL
Uncached
Unittests
Unhandled
VALU
VBIOS
@@ -432,6 +461,7 @@ cuLIB
cuRAND
cuSOLVER
cuSPARSE
cTDP
dataset
datasets
dataspace
@@ -466,6 +496,7 @@ executables
ffmpeg
filesystem
fortran
fp
galb
gcc
gdb
@@ -479,6 +510,7 @@ gzip
heterogenous
hipBLAS
hipBLASLt
hipBLASLt's
hipCUB
hipFFT
hipLIB
@@ -495,6 +527,8 @@ hipfort
hipify
hipsolver
hipsparse
hotspotting
hpc
hpp
hsa
hsakmt
@@ -502,6 +536,7 @@ hyperparameter
ib_core
inband
incrementing
inductor
inferencing
inflight
init
@@ -559,6 +594,8 @@ prebuilt
precompiled
prefetch
prefetchable
prefill
prefills
preprocess
preprocessed
preprocessing
@@ -631,6 +668,7 @@ subexpression
subfolder
subfolders
supercomputing
td
tensorfloat
th
tokenization
@@ -682,7 +720,8 @@ writebacks
wrreq
wzo
xargs
xGMI
xz
yaml
ysvmadyb
zypper
zypper

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
Copyright (c) 2023 - 2024 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -21,7 +21,19 @@ source software compilers, debuggers, and libraries. ROCm is fully integrated in
## Getting the ROCm Source Code
AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains submodules that point to the correct versions of each of the ROCm components. They can be found in the `/libs` directory of the ROCm repository.
AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains a repo manifest file called [default.xml](./default.xml). You can use this manifest file to download the source code for ROCm software.
### Installing the repo tool
The repo tool from Google allows you to manage multiple git repositories simultaneously. Run the following commands to install the repo tool:
```bash
mkdir -p ~/bin/
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
chmod a+x ~/bin/repo
```
**Note:** The ```~/bin/``` folder is used as an example. You can specify a different folder to install the repo tool into if you desire.
### Installing git-lfs
@@ -33,12 +45,17 @@ sudo apt-get install git-lfs
### Downloading the ROCm source code
The following example shows how to download the ROCm source from this repository.
The following example shows how to use the repo tool to download the ROCm source code. If you choose a directory other than ~/bin/ to install the repo tool, you must use that chosen directory in the code as shown below:
```bash
git clone https://github.com/ROCm/ROCm -b amd/dgaliffi/submodules-6-1-2 --recurse-submodules
mkdir -p ~/ROCm/
cd ~/ROCm/
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.0.x
~/bin/repo sync
```
**Note:** Using this sample code will cause the repo tool to download the open source code associated with the specified ROCm release. Ensure that you have ssh-keys configured on your machine for your GitHub ID prior to the download as explained at [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh).
## Building the ROCm source code
Each ROCm component repository contains directions for building that component, such as the rocSPARSE documentation [Installation and Building for Linux](https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/install/Linux_Install_Guide.html). Refer to the specific component documentation for instructions on building the repository.
@@ -60,7 +77,8 @@ The Build time will reduce significantly if we limit the GPU Architecture/s agai
mkdir -p ~/WORKSPACE/ # Or any folder name other than WORKSPACE
cd ~/WORKSPACE/
export ROCM_VERSION=6.1.0 # or 6.1.1 6.1.2
git clone https://github.com/ROCm/ROCm -b amd/dgaliffi/submodules-${ROCM_VERSION} --recurse-submodules
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.1.x -m tools/rocm-build/rocm-${ROCM_VERSION}.xml
~/bin/repo sync
# --------------------------------------
# Step 2: Prepare build environment
@@ -68,9 +86,9 @@ git clone https://github.com/ROCm/ROCm -b amd/dgaliffi/submodules-${ROCM_VERSION
# Option 1: Start a docker container
# Pulling required base docker images:
# Ubuntu20.04 built from ROCm/rocm-build/docker/ubuntu20/Dockerfile
# Ubuntu20.04 built from ROCm/tools/rocm-build/docker/ubuntu20/Dockerfile
docker pull rocm/rocm-build-ubuntu-20.04:6.1
# Ubuntu22.04 built from ROCm/rocm-build/docker/ubuntu22/Dockerfile
# Ubuntu22.04 built from ROCm/tools/rocm-build/docker/ubuntu22/Dockerfile
docker pull rocm/rocm-build-ubuntu-22.04:6.1
# Start docker container and mount the source code folder:
@@ -89,10 +107,10 @@ docker run -ti \
# Option 2: Install required packages into the host machine
# For ubuntu20.04 system
cd ROCm/rocm-build/docker/ubuntu20
cd ROCm/tools/rocm-build/docker/ubuntu20
bash install-prerequisites.sh
# For ubuntu22.04 system
cd ROCm/rocm-build/docker/ubuntu22
cd ROCm/tools/rocm-build/docker/ubuntu22
bash install-prerequisities.sh
# --------------------------------------
@@ -108,13 +126,13 @@ export GPU_ARCHS="gfx940;gfx941;gfx942" # Example
# Pick and run build commands in the docker container:
# Build rocm-dev packages
make -f ROCm/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} rocm-dev
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} rocm-dev
# Build all ROCm packages
make -f ROCm/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} all
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} all
# list all ROCm components to find required components
make -f ROCm/rocm-build/ROCm.mk list_components
make -f ROCm/tools/rocm-build/ROCm.mk list_components
# Build a single ROCm packages
make -f ROCm/rocm-build/ROCm.mk T_rocblas
make -f ROCm/tools/rocm-build/ROCm.mk T_rocblas
# Find built packages in ubuntu20.04:
out/ubuntu-20.04/20.04/deb/
@@ -133,10 +151,16 @@ out/ubuntu-22.04/22.04/logs/rocblas.inprogress # Example
out/ubuntu-22.04/22.04/logs/rocblas # Example
```
Note: [Overview for ROCm.mk](rocm-build/README.md)
Note: [Overview for ROCm.mk](tools/rocm-build/README.md)
## ROCm documentation
This repository contains the [manifest file](https://gerrit.googlesource.com/git-repo/+/HEAD/docs/manifest-format.md)
for ROCm releases, changelogs, and release information.
The `default.xml` file contains information for all repositories and the associated commit used to build
the current ROCm release; `default.xml` uses the [Manifest Format repository](https://gerrit.googlesource.com/git-repo/).
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
`develop` branch of our repositories contains content for the next ROCm release.

View File

@@ -1,4 +1,4 @@
# ROCm 6.1.2 release notes
# ROCm 6.1.5 release notes
<!-- Do not edit this file! This file is autogenerated with -->
<!-- tools/autotag/tag_script.py -->
@@ -11,136 +11,369 @@
<!-- spellcheck-disable -->
ROCm 6.1.2 includes enhancements to SMI tools and improvements to some libraries.
The release notes provide a summary of notable changes since the previous ROCm release.
### OS support
- [Release highlights](#release-highlights)
ROCm 6.1.2 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).
- [Operating system support](#operating-system-support)
### AMD SMI
- [ROCm components versioning](#rocm-components)
AMD SMI for ROCm 6.1.2
- [ROCm known issues](#rocm-known-issues)
#### Additions
* Added process isolation and clean shader APIs and CLI commands.
* `amdsmi_get_gpu_process_isolation()`
* `amdsmi_set_gpu_process_isolation()`
* `amdsmi_set_gpu_clear_sram_data()`
* Added the `MIN_POWER` metric to output provided by `amd-smi static --limit`.
#### Optimizations
* Updated the `amd-smi monitor --pcie` output to prevent delays with the `monitor` command.
#### Changes
* Updated `amismi_get_power_cap_info` to return values in uW instead of W.
* Updated Python library return types for `amdsmi_get_gpu_memory_reserved_pages` and `amdsmi_get_gpu_bad_page_info`.
* Updated the output of `amd-smi metric --ecc-blocks` to show counters available from blocks.
#### Fixes
* `amdsmi_get_gpu_board_info()` no longer returns junk character strings.
* `amd-smi metric --power` now correctly details power output for RDNA3, RDNA2, and MI1x devices.
* Fixed the `amdsmitstReadWrite.TestPowerCapReadWrite` test for RDNA3, RDNA2, and MI100 devices.
* Fixed an issue with the `amdsmi_get_gpu_memory_reserved_pages` and `amdsmi_get_gpu_bad_page_info` Python interface calls.
#### Removals
* Removed the `amdsmi_get_gpu_process_info` API from the Python library. It was removed from the C library in an earlier release.
- [ROCm upcoming changes](#rocm-upcoming-changes)
```{note}
See the AMD SMI [detailed changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.1.x/CHANGELOG.md) with code samples for more information.
If youre using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/docs-6.1.3/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.
```
### ROCm SMI
## Release highlights
ROCm SMI for ROCm 6.1.2
The following is the notable improvement in ROCm 6.1.5.
#### Additions
### Fixed compatibility issue with third-party profiling tools using ROCprofiler-SDK backend
* Added the ring hang event to the `amdsmi_evt_notification_type_t` enum.
[rocprofiler-register](https://github.com/ROCm/rocprofiler-register) library has resolved the profiling tools compatibility issue where applications potentially failed with the error message `rocprofiler_configure not found. Tried to dlopen`. This prevents a failure when profiling ROCm 6.1 applications using third-party profiling tools upgraded to use [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/) in ROCm 6.2.0 or later.
#### Fixes
## Operating system support
* Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on [GitHub](https://github.com/ROCm/ROCm/issues/3112).
* Fixed the parsing of `pp_od_clk_voltage` in `get_od_clk_volt_info` to work better with MI-series hardware.
ROCm 6.1.5 no longer supports CentOS 7.9. All other operating system support from ROCm 6.1.2 remains unchanged in this release.
## Library changes in ROCm 6.1.2
ROCm 6.1.5 requires the [native package manager](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.1.5/how-to/native-install/index.html).
| Library | Version |
|---------|---------|
| AMDMIGraphX | [2.9](https://github.com/ROCm/AMDMIGraphX/releases/tag/rocm-6.1.2) |
| composable_kernel | [0.2.0](https://github.com/ROCm/composable_kernel/releases/tag/rocm-6.1.2) |
| hipBLAS | [2.1.0](https://github.com/ROCm/hipBLAS/releases/tag/rocm-6.1.2) |
| hipBLASLt | [0.7.0](https://github.com/ROCm/hipBLASLt/releases/tag/rocm-6.1.2) |
| hipCUB | [3.1.0](https://github.com/ROCm/hipCUB/releases/tag/rocm-6.1.2) |
| hipFFT | [1.0.14](https://github.com/ROCm/hipFFT/releases/tag/rocm-6.1.2) |
| hipRAND | [2.10.17](https://github.com/ROCm/hipRAND/releases/tag/rocm-6.1.2) |
| hipSOLVER | [2.1.1](https://github.com/ROCm/hipSOLVER/releases/tag/rocm-6.1.2) |
| hipSPARSE | [3.0.1](https://github.com/ROCm/hipSPARSE/releases/tag/rocm-6.1.2) |
| hipSPARSELt | [0.2.0](https://github.com/ROCm/hipSPARSELt/releases/tag/rocm-6.1.2) |
| hipTensor | [1.2.0](https://github.com/ROCm/hipTensor/releases/tag/rocm-6.1.2) |
| MIOpen | [3.1.0](https://github.com/ROCm/MIOpen/releases/tag/rocm-6.1.2) |
| MIVisionX | [2.5.0](https://github.com/ROCm/MIVisionX/releases/tag/rocm-6.1.2) |
| rccl | [2.18.6](https://github.com/ROCm/rccl/releases/tag/rocm-6.1.2) |
| rocALUTION | [3.1.1](https://github.com/ROCm/rocALUTION/releases/tag/rocm-6.1.2) |
| rocBLAS | 4.1.0 ⇒ [4.1.2](https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.1.2) |
| rocDecode | 0.5.0 ⇒ [0.6.0](https://github.com/ROCm/rocDecode/releases/tag/rocm-6.1.2) |
| rocFFT | [1.0.27](https://github.com/ROCm/rocFFT/releases/tag/rocm-6.1.2) |
| rocm-cmake | [0.12.0](https://github.com/ROCm/rocm-cmake/releases/tag/rocm-6.1.2) |
| rocPRIM | [3.1.0](https://github.com/ROCm/rocPRIM/releases/tag/rocm-6.1.2) |
| rocRAND | [3.0.1](https://github.com/ROCm/rocRAND/releases/tag/rocm-6.1.2) |
| rocSOLVER | [3.25.0](https://github.com/ROCm/rocSOLVER/releases/tag/rocm-6.1.2) |
| rocSPARSE | [3.1.2](https://github.com/ROCm/rocSPARSE/releases/tag/rocm-6.1.2) |
| rocThrust | [3.0.1](https://github.com/ROCm/rocThrust/releases/tag/rocm-6.1.2) |
| rocWMMA | [1.4.0](https://github.com/ROCm/rocWMMA/releases/tag/rocm-6.1.2) |
| rpp | [1.5.0](https://github.com/ROCm/rpp/releases/tag/rocm-6.1.2) |
| Tensile | [4.40.0](https://github.com/ROCm/Tensile/releases/tag/rocm-6.1.2) |
See the [Compatibility
matrix](../../docs/compatibility/compatibility-matrix.rst)
for more information about operating system compatibility.
### RCCL
## ROCm components
RCCL 2.18.6 for ROCm 6.1.2
The following table lists the versions of ROCm components for ROCm 6.1.5.
Click {fab}`github` to go to the component's source code on GitHub.
#### Changes
<div class="pst-scrollable-table-container">
<table id="rocm-rn-components" class="table">
<thead>
<tr>
<th>Category</th>
<th>Group</th>
<th>Name</th>
<th>Version</th>
<th></th>
</tr>
</thead>
<colgroup>
<col span="1">
<col span="1">
</colgroup>
<tbody class="rocm-components-libs rocm-components-ml">
<tr>
<th rowspan="6">Libraries</th>
<th rowspan="6">Machine learning and computer vision</th>
<td><a href="https://rocm.docs.amd.com/projects/composable_kernel/en/docs-6.1.5">Composable Kernel</a>
</td>
<td>1.1.0</td>
<td><a href="https://github.com/ROCm/composable_kernel/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/AMDMIGraphX/en/docs-6.1.5">MIGraphX</a></td>
<td>2.9</td>
<td><a href="https://github.com/ROCm/AMDMIGraphX/releases/tag/rocm-6.1.5"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/MIOpen/en/docs-6.1.5">MIOpen</a></td>
<td>3.1.0</td>
<td><a href="https://github.com/ROCm/MIOpen/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/MIVisionX/en/docs-6.1.5">MIVisionX</a></td>
<td>2.5.0</td>
<td><a href="https://github.com/ROCm/MIVisionX/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocDecode/en/docs-6.1.5">rocDecode</a></td>
<td>0.6.0</td>
<td><a href="https://github.com/ROCm/rocDecode/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rpp/en/docs-6.1.5">RPP</a></td>
<td>1.5.0</td>
<td><a href="https://github.com/ROCm/rpp/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-libs rocm-components-communication">
<tr>
<th rowspan="1"></th>
<th rowspan="1">Communication</th>
<td><a href="https://rocm.docs.amd.com/projects/rccl/en/docs-6.1.5">RCCL</a></td>
<td>2.18.6</td>
<td><a href="https://github.com/ROCm/rccl/releases/tag/rocm-6.1.5"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-libs rocm-components-math tbody-reverse-zebra">
<tr>
<th rowspan="16"></th>
<th rowspan="16">Math</th>
<td><a href="https://rocm.docs.amd.com/projects/hipBLAS/en/docs-6.1.5">hipBLAS</a></td>
<td>2.1.0</td>
<td><a href="https://github.com/ROCm/hipBLAS/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.1.5">hipBLASLt</a></td>
<td>0.7.0</td>
<td><a href="https://github.com/ROCm/hipBLASLt/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipFFT/en/docs-6.1.5">hipFFT</a></td>
<td>1.0.14</td>
<td><a href="https://github.com/ROCm/hipFFT/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipfort/en/docs-6.1.5">hipfort</a></td>
<td>0.4.0</td>
<td><a href="https://github.com/ROCm/hipfort/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipRAND/en/docs-6.1.5">hipRAND</a></td>
<td>2.10.16</td>
<td><a href="https://github.com/ROCm/hipRAND/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSOLVER/en/docs-6.1.5">hipSOLVER</a></td>
<td>2.1.1</td>
<td><a href="https://github.com/ROCm/hipSOLVER/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSE/en/docs-6.1.5">hipSPARSE</a></td>
<td>3.0.1</td>
<td><a href="https://github.com/ROCm/hipSPARSE/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSELt/en/docs-6.1.5">hipSPARSELt</a></td>
<td>0.2.0</td>
<td><a href="https://github.com/ROCm/hipSPARSELt/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocALUTION/en/docs-6.1.5">rocALUTION</a></td>
<td>3.1.1</td>
<td><a href="https://github.com/ROCm/rocALUTION/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.1.5">rocBLAS</a></td>
<td>4.1.2</td>
<td><a href="https://github.com/ROCm/rocBLAS/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocFFT/en/docs-6.1.5">rocFFT</a></td>
<td>1.0.27</td>
<td><a href="https://github.com/ROCm/rocFFT/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocRAND/en/docs-6.1.5">rocRAND</a></td>
<td>3.0.1</td>
<td><a href="https://github.com/ROCm/rocRAND/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocSOLVER/en/docs-6.1.5">rocSOLVER</a></td>
<td>3.25.0</td>
<td><a href="https://github.com/ROCm/rocSOLVER/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocSPARSE/en/docs-6.1.5">rocSPARSE</a></td>
<td>3.1.2</td>
<td><a href="https://github.com/ROCm/rocSPARSE/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocWMMA/en/docs-6.1.5">rocWMMA</a></td>
<td>1.4.0</td>
<td><a href="https://github.com/ROCm/rocWMMA/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://github.com/ROCm/tensile/">Tensile</a></td>
<td>4.40.0</td>
<td><a href="https://github.com/ROCm/tensile/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-libs rocm-components-primitives tbody-reverse-zebra">
<tr>
<th rowspan="4"></th>
<th rowspan="4">Primitives</th>
<td><a href="https://rocm.docs.amd.com/projects/hipCUB/en/docs-6.1.5">hipCUB</a></td>
<td>3.1.0</td>
<td><a href="https://github.com/ROCm/hipCUB/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipTensor/en/docs-6.1.5">hipTensor</a></td>
<td>1.2.0</td>
<td><a href="https://github.com/ROCm/hipTensor/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocPRIM/en/docs-6.1.5">rocPRIM</a></td>
<td>3.1.0</td>
<td><a href="https://github.com/ROCm/rocPRIM/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocThrust/en/docs-6.1.5">rocThrust</a></td>
<td>3.0.1</td>
<td><a href="https://github.com/ROCm/rocThrust/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-tools rocm-components-system tbody-reverse-zebra">
<tr>
<th rowspan="5">Tools</th>
<th rowspan="5">System management</th>
<td><a href="https://rocm.docs.amd.com/projects/amdsmi/en/docs-6.1.5">AMD SMI</a></td>
<td>24.5.1</td>
<td><a href="https://github.com/ROCm/amdsmi/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocminfo/en/docs-6.1.5">rocminfo</a></td>
<td>1.0.0</td>
<td><a href="https://github.com/ROCm/rocminfo/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rdc/en/docs-6.1.5">ROCm Data Center Tool</a></td>
<td>0.3.0</td>
<td><a href="https://github.com/ROCm/rdc/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocm_smi_lib/en/docs-6.1.5">ROCm SMI</a></td>
<td>7.2.0</td>
<td><a href="https://github.com/ROCm/rocm_smi_lib/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/docs-6.1.5">ROCm Validation Suite</a></td>
<td>1.0.0</td>
<td><a href="https://github.com/ROCm/ROCmValidationSuite/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-tools rocm-components-perf">
<tr>
<th rowspan="3"></th>
<th rowspan="3">Performance</th>
<td><a href="https://rocm.docs.amd.com/projects/rocm_bandwidth_test/en/docs-6.1.5">ROCm Bandwidth
Test</a></td>
<td>1.4.0</td>
<td><a href="https://github.com/ROCm/rocm_bandwidth_test/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.1.5/">ROCProfiler</a></td>
<td>2.0.0</td>
<td><a href="https://github.com/ROCm/ROCProfiler/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr >
<td><a href="https://rocm.docs.amd.com/projects/roctracer/en/docs-6.1.5/">ROCTracer</a></td>
<td>4.1.0</td>
<td><a href="https://github.com/ROCm/ROCTracer/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-tools rocm-components-dev tbody-reverse-zebra">
<tr>
<th rowspan="5"></th>
<th rowspan="5">Development</th>
<td><a href="https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.1.5/">HIPIFY</a></td>
<td>17.0.0</td>
<td><a href="https://github.com/ROCm/HIPIFY/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCdbgapi/en/docs-6.1.5">ROCdbgapi</a></td>
<td>0.71.0</td>
<td><a href="https://github.com/ROCm/ROCdbgapi/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCmCMakeBuildTools/en/docs-6.1.5/">ROCm CMake</a></td>
<td>0.12.0</td>
<td><a href="https://github.com/ROCm/rocm-cmake/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCgdb/en/docs-6.1.5">ROCm Debugger (ROCgdb)</a>
</td>
<td>14.1</td>
<td><a href="https://github.com/ROCm/ROCgdb/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocr_debug_agent/en/docs-6.1.5">ROCr Debug Agent</a>
</td>
<td>2.0.3</td>
<td><a href="https://github.com/ROCm/rocr_debug_agent/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-compilers">
<tr>
<th rowspan="2" colspan="2">Compilers</th>
<td><a href="https://rocm.docs.amd.com/projects/HIPCC/en/docs-6.1.5">HIPCC</a></td>
<td>1.0.0</td>
<td><a href="https://github.com/ROCm/llvm-project/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://github.com/ROCm/llvm-project/">llvm-project</a></td>
<td>17.0.0</td>
<td><a href="https://github.com/ROCm/llvm-project/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
<tbody class="rocm-components-runtimes">
<tr>
<th rowspan="2" colspan="2">Runtimes</th>
<td><a href="https://rocm.docs.amd.com/projects/HIP/en/docs-6.1.5">HIP</a></td>
<td>6.1</td>
<td><a href="https://github.com/ROCm/HIP/releases/tag/rocm-6.1.5"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCR-Runtime/en/docs-6.1.5">ROCr Runtime</a></td>
<td>1.13.0</td>
<td><a href="https://github.com/ROCm/ROCR-Runtime"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
</table>
</div>
* Reduced `NCCL_TOPO_MAX_NODES` to limit stack usage and avoid stack overflow.
## ROCm known issues
### rocBLAS
ROCm known issues are tracked on [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue).
rocBLAS 4.1.2 for ROCm 6.1.2
#### Optimizations
* Tuned BBS TN and TT operations on the CDNA3 architecture.
#### Fixes
* Fixed an issue related to obtaining solutions for BF16 TT operations.
### rocDecode
rocDecode 0.6.0 for ROCm 6.1.2
#### Additions
* Added support for FFmpeg v5.x.
#### Optimizations
* Updated error checking in the `rocDecode-setup.py` script.
#### Changes
* Updated core dependencies.
* Updated to support the use of public LibVA headers.
#### Fixes
* Fixed some package dependencies.
## Upcoming changes
## ROCm upcoming changes
* A future release will enable the use of HIPCC compiled binaries `hipcc.bin` and `hipconfig.bin` by default. No action is needed by users; you may continue calling high-level Perl scripts `hipcc` and `hipconfig`. `hipcc.bin` and `hipconfig.bin` will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke `hipcc.pl` and `hipconfig.pl`, set the `HIP_USE_PERL_SCRIPTS` environment variable to `1`.
* A subsequent release will remove high-level HIPCC Perl scripts from `hipcc` and `hipconfig`. This release will remove the `HIP_USE_PERL_SCRIPTS` environment variable. It will rename `hipcc.bin` and `hipconfig.bin` to `hipcc` and `hipconfig` respectively. No action is needed by the users. To revert to the previous behavior, invoke `hipcc.pl` and `hipconfig.pl` explicitly.

View File

@@ -77,8 +77,7 @@ Obtain the value of `gpu-arch` by running the following command:
[//]: # (dated link below, needs updating)
See the complete list of compiler command-line references
[here](https://github.com/ROCm/llvm-project/blob/amd-stg-open/clang/docs/CommandGuide/clang.rst).
See the complete list of [compiler command-line references](https://github.com/ROCm/llvm-project/blob/amd-staging/openmp/docs/CommandLineArgumentReference.rst).
### Using `rocprof` with OpenMP

View File

@@ -17,7 +17,7 @@ following section.
## ROCm component licenses
ROCm is released by Advanced Micro Devices, Inc. and is licensed per component separately.
ROCm is released by Advanced Micro Devices, Inc. (AMD) and is licensed per component separately.
The following table is a list of ROCm components with links to their respective license
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
@@ -25,66 +25,71 @@ additional licenses. Please review individual repositories for more information.
<!-- spellcheck-disable -->
| Component | License |
|:---------------------|:-------------------------|
| [AMDMIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [HIPCC](https://github.com/ROCm/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm/HIPCC/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm/HIPIFY/) | [MIT](https://github.com/ROCm/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIP](https://github.com/ROCm/HIP/) | [MIT](https://github.com/ROCm/HIP/blob/develop/LICENSE.txt) |
| [MIOpenGEMM](https://github.com/ROCm/MIOpenGEMM/) | [MIT](https://github.com/ROCm/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/master/LICENSE.txt) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/ROCm/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCclr](https://github.com/ROCm/ROCclr/) | [MIT](https://github.com/ROCm/ROCclr/blob/develop/LICENSE.txt) |
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/ROCm/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/ROCm/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/ROCm/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCmValidationSuite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| [HIPCC](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc) | [MIT](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/hipcc/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm/HIPIFY/) | [MIT](https://github.com/ROCm/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [AMDMIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/develop/LICENSE.txt) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/develop/LICENSE.txt) |
| [AMD Common Language Runtime (CLR)](https://github.com/ROCm/clr) | [MIT](https://github.com/ROCm/clr/blob/develop/LICENCE) |
| [ROCm-Core](https://github.com/ROCm/rocm-core) | [MIT](https://github.com/ROCm/rocm-core/blob/master/copyright) |
| [hipamd](https://github.com/ROCm/clr/tree/develop/hipamd) | [MIT](https://github.com/ROCm/clr/blob/develop/hipamd/LICENSE.txt) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/clr/tree/develop/opencl) | [MIT](https://github.com/ROCm/clr/blob/develop/opencl/LICENSE.txt) |
| [Tensile](https://github.com/ROCm/Tensile/) | [MIT](https://github.com/ROCm/Tensile/blob/develop/LICENSE.md) |
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
| [aomp](https://github.com/ROCm/aomp/) | [Apache 2.0](https://github.com/ROCm/aomp/blob/aomp-dev/LICENSE) |
| [atmi](https://github.com/ROCm/atmi/) | [MIT](https://github.com/ROCm/atmi/blob/master/LICENSE.txt) |
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/amd-staging/LICENSE.TXT) |
| [llvm-project/flang](https://github.com/ROCm/llvm-project/tree/amd-staging/flang) | [Apache 2.0](https://github.com/ROCm/llvm-project/blob/amd-staging/flang/LICENSE.TXT) |
| [Code Object Manager (Comgr)](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/LICENSE.TXT) |
| [clang-ocl](https://github.com/ROCm/clang-ocl/) | [MIT](https://github.com/ROCm/clang-ocl/blob/master/LICENSE) |
| [flang](https://github.com/ROCm/flang/) | [Apache 2.0](https://github.com/ROCm/flang/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/master/LICENSE.txt) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCR-Runtime](https://github.com/ROCm/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCR Debug Agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/amd-staging/LICENSE.txt) |
| [Composable Kernel](https://github.com/ROCm/composable_kernel) | [MIT](https://github.com/ROCm/composable_kernel/blob/develop/LICENSE) |
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/rocm/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCm/hipBLAS/) | [MIT](https://github.com/ROCm/hipBLAS/blob/develop/LICENSE.md) |
| [hipBLASLt](https://github.com/ROCm/hipBLASLt/) | [MIT](https://github.com/ROCm/hipBLASLt/blob/develop/LICENSE.md) |
| [hipCUB](https://github.com/ROCm/hipCUB/) | [Custom](https://github.com/ROCm/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCm/hipFFT/) | [MIT](https://github.com/ROCm/hipFFT/blob/develop/LICENSE.md) |
| [hipFORT](https://github.com/ROCm/hipfort/) | [MIT](https://github.com/ROCm/hipfort/blob/develop/LICENSE) |
| [hipRAND](https://github.com/ROCm/hipRAND/) | [MIT](https://github.com/ROCm/hipRAND/blob/develop/LICENSE.txt) |
| [hipSOLVER](https://github.com/ROCm/hipSOLVER/) | [MIT](https://github.com/ROCm/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCm/hipSPARSELt/) | [MIT](https://github.com/ROCm/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCm/hipSPARSE/) | [MIT](https://github.com/ROCm/hipSPARSE/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCm/hipSPARSELt/) | [MIT](https://github.com/ROCm/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCm/hipTensor) | [MIT](https://github.com/ROCm/hipTensor/blob/develop/LICENSE) |
| [hipamd](https://github.com/ROCm/hipamd/) | [MIT](https://github.com/ROCm/hipamd/blob/develop/LICENSE.txt) |
| [hipfort](https://github.com/ROCm/hipfort/) | [MIT](https://github.com/ROCm/hipfort/blob/master/LICENSE) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/main/LICENSE.TXT) |
| [rccl](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [rdc](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/master/LICENSE) |
| [rocAL](https://github.com/ROCm/rocAL) | [MIT](https://github.com/ROCm/rocAL/blob/develop/LICENSE.txt) |
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCm/rocBLAS/) | [MIT](https://github.com/ROCm/rocBLAS/blob/develop/LICENSE.md) |
| [rocDecode](https://github.com/ROCm/rocDecode) | [MIT](https://github.com/ROCm/rocDecode/blob/develop/LICENSE) |
| [rocFFT](https://github.com/ROCm/rocFFT/) | [MIT](https://github.com/ROCm/rocFFT/blob/develop/LICENSE.md) |
| [rocPRIM](https://github.com/ROCm/rocPRIM/) | [MIT](https://github.com/ROCm/rocPRIM/blob/develop/LICENSE.txt) |
| [ROCm Performance Primitives (RPP)](https://github.com/ROCm/rpp) | [MIT](https://github.com/ROCm/rpp/blob/develop/LICENSE) |
| [rocRAND](https://github.com/ROCm/rocRAND/) | [MIT](https://github.com/ROCm/rocRAND/blob/develop/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCm/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCm/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCm/rocSPARSE/) | [MIT](https://github.com/ROCm/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCm/rocThrust/) | [Apache 2.0](https://github.com/ROCm/rocThrust/blob/develop/LICENSE) |
| [rocWMMA](https://github.com/ROCm/rocWMMA/) | [MIT](https://github.com/ROCm/rocWMMA/blob/develop/LICENSE.md) |
| [rocm-cmake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
| [rocm_bandwidth_test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/ROCm/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-master/LICENSE) |
| [rocr_debug_agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/master/LICENSE.txt) |
| [roctracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
| [ROCm Communication Collectives Library (RCCL)](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/develop/LICENSE) |
| [ROCm CMake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-staging/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [ROCm SMI Lib](https://github.com/ROCm/rocm_smi_lib/) | [MIT](https://github.com/ROCm/rocm_smi_lib/blob/develop/License.txt) |
| [AMD SMI](https://github.com/ROCm/amdsmi) | [MIT](https://github.com/ROCm/amdsmi/blob/develop/LICENSE) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/amd-staging/License.txt) |
| [ROCProfiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-master/LICENSE) |
| [ROCTracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [TransferBench](https://github.com/ROCm/TransferBench) | [MIT](https://github.com/ROCm/TransferBench/blob/develop/LICENSE.md) |
| [ROCmValidationSuite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| hsa-amd-aqlprofile | [AMD Software EULA](https://www.amd.com/en/legal/eula/amd-software-eula.html)
Open sourced ROCm components are released via public GitHub
repositories, packages on https://repo.radeon.com and other distribution channels.
Proprietary products are only available on https://repo.radeon.com. Currently, only
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
repositories, packages on [https://repo.radeon.com](https://repo.radeon.com) and other distribution channels.
Proprietary products are only available on [https://repo.radeon.com](https://repo.radeon.com). Currently, only
one component of ROCm, `rocm-llvm-alt` is governed by a proprietary license.
Proprietary components are organized in a proprietary subdirectory in the package
repositories to distinguish from open sourced packages.
@@ -92,7 +97,7 @@ repositories to distinguish from open sourced packages.
The following additional terms and conditions apply to your use of ROCm technical documentation.
```
©2023 Advanced Micro Devices, Inc. All rights reserved.
©2023 - 2024 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
@@ -125,8 +130,8 @@ companies.
:::{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary and is
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
subject to the license agreement enclosed in the directory for the binary available
in `/opt/rocm/share/doc/hsa-amd-aqlprofile/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
@@ -134,9 +139,8 @@ AOCC CPU Optimizations.
:::
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/`
following location: `/opt/rocm/share/doc/<component-name>/` or in the locations
specified in the preceding table.
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`
For example, you can fetch the licensing information of the `amd_comgr`
component (Code Object Manager) from the `/opt/rocm/share/doc/amd_comgr/LICENSE.txt` file.

View File

@@ -8,119 +8,121 @@ Compatibility matrix
Use this matrix to view the ROCm compatibility across successive major and minor releases.
.. container:: format-big-table
.. csv-table::
:header: "ROCm Version", "6.1.0", "6.0.0"
:header: "ROCm Version", "6.1.5", "6.1.2", "6.0.0"
:stub-columns: 1
:doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>`, "Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3"
,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.3, 9.2","RHEL 9.3, 9.2"
,"RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,CentOS 7.9,CentOS 7.9
,,
:doc:`GFX Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3
,CDNA2,CDNA2
,CDNA,CDNA
,RDNA3,RDNA3
,RDNA2,RDNA2
,,
:doc:`GFX Card <rocm-install-on-linux:reference/system-requirements>`,gfx1100,gfx1100
,gfx1030,gfx1030
,gfx942 [#]_, gfx942 [#]_
,gfx90a,gfx90a
,gfx908,gfx908
,,
ECOSYSTEM SUPPORT:,,
:doc:`PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`,"2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`,"2.15, 2.14, 2.13","2.14, 2.13, 2.12"
:doc:`JAX <rocm-install-on-linux:how-to/3rd-party/jax-install>`,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.14.1
,,
3RD PARTY COMMUNICATION LIBS:,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.14.1,>=1.14.1
,,
3RD PARTY ALGORITHM LIBS:,,
Thrust,2.1.0,2.0.1
CUB,2.1.0,2.0.1
,,
ML & COMPUTER VISION LIBS:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.9.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.1.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,2.5.0,2.5.0
:doc:`rocDecode <rocdecode:index>`,0.5.0,N/A
:doc:`ROCm Performance Primitives (RPP) <rpp:index>`,1.5.0,1.4.0
,,
COMMUNICATION:,,
:doc:`RCCL <rccl:index>`,2.18.6,2.18.3
,,
MATH LIBS:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.1.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.7.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.14,1.0.13
:doc:`hipFORT <hipfort:index>`,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.1.0,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.0.1,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.1.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.1.1,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.1.0,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.27,1.0.23
:doc:`rocRAND <rocrand:index>`,3.0.1,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.25.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.1.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.4.0,1.3.0
`Tensile <https://github.com/ROCm/Tensile>`_,4.40.0,4.39.0
,,
PRIMITIVES:,,
:doc:`hipCUB <hipcub:index>`,3.1.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.2.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.1.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.0.1,3.0.0
,,
SUPPORT LIBS:,,
`hipother <https://github.com/ROCm/hipother>`_,6.1.40091,6.0.32830
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.12.0,0.11.0
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.1.0,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,20240125.3.30,20231016.2.245
,,
TOOLS:,,
:doc:`AMD SMI <amdsmi:index>`,24.4.1,23.4.2
:doc:`HIPIFY <hipify:index>`,17.0.0,17.0.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.71.0,0.71.0
`ROCm Debug Agent (ROCdebug-agent) <https://github.com/ROCm/rocr_debug_agent>`_,2.0.3,2.0.3
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60100,2.0.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.3.0,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60100,4.1.0
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,14.1.0,13.2.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.0.0,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.1.0,rocm-6.0.0
:doc:`TransferBench <transferbench:index>`,1.48,1.46
,,
COMPILERS:,,
`AOMP <https://github.com/ROCm/aomp>`_,17.60.0,17.60.0
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,0.5.0,0.5.0
`Flang <https://github.com/ROCm/flang>`_,17.0.0.24103,17.0.0.23483
`llvm-project <https://github.com/ROCm/llvm-project>`_,17.0.0.24103,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,17.0.0.24103,17.0.0.23483
,,
RUNTIMES:,,
:doc:`HIP <hip:index>`,6.1.40091,6.0.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0
:doc:`ROCR-Runtime <rocr-runtime:index>`,1.13.0,1.12.0
:doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>`,"Ubuntu 22.04.5 [#Ubuntu220405]_, 22.04.4, 22.04.3","Ubuntu 22.04.5 [#Ubuntu220405]_, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3"
,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.4 [#red-hat94]_, 9.3, 9.2","RHEL 9.4 [#red-hat94]_, 9.3, 9.2","RHEL 9.3, 9.2"
,"RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,CentOS 7.9,CentOS 7.9
,Oracle Linux 8.9 [#oracle89]_,Oracle Linux 8.9 [#oracle89]_,
,.. _architecture-support-compatibility-matrix,,
:doc:`GFX Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA
,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix,,
:doc:`GFX Card <rocm-install-on-linux:reference/system-requirements>`,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030
, gfx942 [#mi300_612]_, gfx942 [#mi300_612]_, gfx942 [#mi300_600]_
,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908
,,,
ECOSYSTEM SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>`,"2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.14.1
,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.14.1,>=1.14.1,>=1.14.1
,,,
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix:,,
Thrust,2.1.0,2.1.0,2.0.1
CUB,2.1.0,2.1.0,2.0.1
,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.9.0,2.9.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.1.0,3.1.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,2.5.0,2.5.0,2.5.0
:doc:`rocDecode <rocdecode:index>`,0.6.0,0.6.0,N/A
:doc:`RPP <rpp:index>`,1.5.0,1.5.0,1.4.0
,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
:doc:`RCCL <rccl:index>`,2.18.6,2.18.6,2.18.3
,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.1.0,2.1.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.7.0,0.7.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.14,1.0.14,1.0.13
:doc:`hipFORT <hipfort:index>`,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.1.1,2.1.1,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.0.1,3.0.1,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.0,0.2.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.1.1,3.1.1,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.1.2,4.1.2,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.27,1.0.27,1.0.23
:doc:`rocRAND <rocrand:index>`,3.0.1,3.0.1,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.25.0,3.25.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.1.2,3.1.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.4.0,1.4.0,1.3.0
`Tensile <https://github.com/ROCm/Tensile>`_,4.40.0,4.40.0,4.39.0
,,,
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
:doc:`hipCUB <hipcub:index>`,3.1.0,3.1.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.2.0,1.2.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.1.0,3.1.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.0.1,3.0.1,3.0.0
,,,
SUPPORT LIBS,,,
`hipother <https://github.com/ROCm/hipother>`_,6.1.40093,6.1.40093,6.1.32830
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.12.0,0.12.0,0.11.0
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.1.5,6.1.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,20240125.5.08,20240125.5.08,20231016.2.245
,,,
TOOLS,.. _tools-support-compatibility-matrix:,,
:doc:`AMD SMI <amdsmi:index>`,24.5.1,24.5.1,23.4.2
:doc:`HIPIFY <hipify:index>`,17.0.0.24193,17.0.0.24193,17.0.0.23483
:doc:`ROCdbgapi <rocdbgapi:index>`,0.71.0,0.71.0,0.71.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60105,2.0.60102,2.0.60000
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.3.0,0.3.0,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60105,4.1.60102,4.1.60000
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,14.1.0,14.1.0,13.2.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.2.0,7.2.0,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.1.5,rocm-6.1.2,rocm-6.0.0
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.3,2.0.3,2.0.3
,,,
COMPILERS,.. _compilers-support-compatibility-matrix:,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,0.5.0,0.5.0,0.5.0
:doc:`hipCC <hipcc:index>`,1.0.0,1.0.0,1.0.0
`Flang <https://github.com/ROCm/flang>`_,17.0.0.24193,17.0.0.24193,17.0.0.23483
:doc:`llvm-project <llvm-project:index>`,17.0.0.24193,17.0.0.24193,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,17.0.0.24193,17.0.0.24193,17.0.0.23483
,,,
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.1.40093,6.1.40093,6.1.32830
:doc:`HIP <hip:index>`,6.1.40093,6.1.40093,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
:doc:`ROCR-Runtime <rocr-runtime:index>`,1.13.0,1.13.0,1.12.0
.. rubric:: Footnotes
.. [#] **For ROCm 6.1** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.3 & 8.9 and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#] **For ROCm 6.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9 and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#Ubuntu220405] Preview support of Ubuntu 22.04.5 only.
.. [#red-hat94] **For ROCm 6.1** - RHEL 9.4 is supported only on AMD Instinct MI300A.
.. [#oracle89] **For ROCm 6.1.1** - Oracle Linux is supported only on AMD Instinct MI300X.
.. [#mi300_612] **For ROCm 6.1** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_600] **For ROCm 6.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9 and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.

View File

@@ -416,7 +416,7 @@ description, refer to the corresponding library data type support page.
- -/✅
- -/✅
*
- hipRAND (:doc:`details <hiprand:data-type-support>`)
- hipRAND (:doc:`details <hiprand:api-reference/data-type-support>`)
- -/✅
- -/✅
- -/✅
@@ -428,7 +428,7 @@ description, refer to the corresponding library data type support page.
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details <hipcub:data-type-support>`)
- hipCUB (:doc:`details <hipcub:api-reference/data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
@@ -474,7 +474,7 @@ description, refer to the corresponding library data type support page.
- -/✅
- -/✅
*
- hipRAND (:doc:`details <hiprand:data-type-support>`)
- hipRAND (:doc:`details <hiprand:api-reference/data-type-support>`)
- -/❌
- -/❌
- -/✅
@@ -492,7 +492,7 @@ description, refer to the corresponding library data type support page.
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details <hipcub:data-type-support>`)
- hipCUB (:doc:`details <hipcub:api-reference/data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅

View File

@@ -33,8 +33,8 @@ Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
subdivided into four SIMD units that process SIMD instructions of 16 data
elements per instruction (for the FP64 data type). This enables the CU to
process 64 work items (a so-called “wavefront”) at a peak clock frequency of 1.7
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 45.3
TFLOPS for vector instructions. The MI250 compute units also provide specialized
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 22.6
TFLOPS for vector instructions. This equates to 45.3 TFLOPS for vector instructions for both GCDs together. The MI250 compute units also provide specialized
execution units (also called matrix cores), which are geared toward executing
matrix operations like matrix-matrix multiplications. For FP64, the peak
performance of these units amounts to 90.5 TFLOPS.

View File

@@ -10,7 +10,7 @@ GPU computational elements of the processor along with the lower levels of the c
The following image depicts the structure of a single XCD in the AMD Instinct MI300 accelerator series.
```{figure} ../../data/conceptual/gpu-arch/image007.png
```{figure} ../../data/shared/xcd-sys-arch.png
---
name: mi300-xcd
align: center
@@ -103,7 +103,7 @@ MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, w
## Node-level architecture
```{figure} ../../data/conceptual/gpu-arch/image009.png
```{figure} ../../data/shared/mi300-node-level-arch.png
---
name: mi300-node

View File

@@ -51,7 +51,7 @@ In HIP, pinned memory allocations are coherent by default (`hipHostMallocDefault
There are additional pinned memory flags (e.g. `hipHostMallocMapped` and `hipHostMallocPortable`).
On MI200 these options do not impact performance.
<!-- TODO: link to programming_manual#memory-allocation-flags -->
For more information, see the section *memory allocation flags* in the HIP Programming Guide: {doc}`hip:user_guide/programming_manual`.
For more information, see the section *memory allocation flags* in the HIP Programming Guide: {doc}`hip:how-to/programming_manual`.
:::
Much like how a process can be locked to a CPU core by setting affinity, a pinned memory allocator does this with the memory storage system.

View File

@@ -1,47 +0,0 @@
.. meta::
:description: Setting the number of CUs
:keywords: AMD, ROCm, cu, number of cus
.. _env-variables-reference:
*************************************************************
Setting the number of CUs
*************************************************************
When using GPUs to accelerate compute workloads, it sometimes becomes necessary
to configure the hardware's usage of Compute Units (CU). This is a more advanced
option, so please read this page before experimentation.
The GPU driver provides two environment variables to set the number of CUs used. The
first one is ``HSA_CU_MASK`` and the second one is ``ROC_GLOBAL_CU_MASK``. The main
difference is that ``ROC_GLOBAL_CU_MASK`` sets the CU mask on queues created by the HIP
or the OpenCL runtimes. While ``HSA_CU_MASK`` sets the mask on a lower level of queue
creation in the driver, this mask will also be set for queues being profiled.
The environment variables have the following syntax:
::
ID = [0-9][0-9]* ex. base 10 numbers
ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7
GPU_list = ID_list ex. 0,2-4,7
CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7
CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. For GPUs listed,
the listed or masked CUs will be enabled, the rest disabled. Unlisted GPUs will not
be affected, their CUs will all be enabled.
The parsing of the variable is stopped when a syntax error occurs. The erroneous set
and the ones following will be ignored. Repeating GPU or CU IDs are a syntax error.
Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. For excluding
GPU devices use ``ROCR_VISIBLE_DEVICES``.
These environment variables only affect ROCm software, not graphics applications.
It's important to know that not all CU configurations are valid on all devices. For
instance, on devices where two CUs can be combined into a WGP (for kernels running in
WGP mode), it is not valid to disable only a single CU in a WGP. `This paper
<https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ can provide more information
about what to expect, when disabling CUs.

View File

@@ -424,4 +424,8 @@ Shadow byte legend (one shadow byte represents 8 application bytes):
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
* It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
* It can also be the case that a memory fault is reported for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
* There is currently a bug which can result in memory faults being reported when running instrumented device code which makes use of `malloc`, `free`, `new`, or `delete`.
* There is currently a bug which can result in undefined symbols being reported at compile time when instrumented device code makes use of `new` and `delete`.

View File

@@ -5,25 +5,9 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import shutil
import jinja2
import os
# Environment to process Jinja templates.
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
# Jinja templates to render out.
templates = []
# Render templates and output files without the last extension.
# For example: 'install.md.jinja' becomes 'install.md'.
for template in templates:
rendered = jinja_env.get_template(template).render()
with open(os.path.splitext(template)[0], 'w') as file:
file.write(rendered)
shutil.copy2('../RELEASE.md','./about/release-notes.md')
# Keep capitalization due to similar linking on GitHub's markdown preview.
shutil.copy2('../CHANGELOG.md','./about/changelog.md')
shutil.copy2("../RELEASE.md", "./about/release-notes.md")
latex_engine = "xelatex"
latex_elements = {
@@ -37,58 +21,71 @@ latex_elements = {
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.1.2"
release = "6.1.2"
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = "6.1.5"
release = "6.1.5"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "os": ["linux"], "date": "2025-03-04"},
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/install", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/train-a-model", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/deploy-your-model", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/hugging-face-models", "os": ["linux"]},
{"file": "how-to/rocm-for-hpc/index", "os": ["linux"]},
{"file": "how-to/llm-fine-tuning-optimization/index", "os": ["linux"]},
{"file": "how-to/llm-fine-tuning-optimization/overview", "os": ["linux"]},
{
"file":"about/release-notes",
"os":["linux", "windows"],
"date":"2024-06-04"
"file": "how-to/llm-fine-tuning-optimization/fine-tuning-and-inference",
"os": ["linux"],
},
{
"file":"about/changelog",
"os":["linux", "windows"],
"date":"2024-06-04"
"file": "how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference",
"os": ["linux"],
},
{"file":"install/windows/install-quick", "os":["windows"]},
{"file":"install/linux/install-quick", "os":["linux"]},
{"file":"install/linux/install", "os":["linux"]},
{"file":"install/linux/install-options", "os":["linux"]},
{"file":"install/linux/prerequisites", "os":["linux"]},
{"file":"install/docker", "os":["linux"]},
{"file":"install/magma-install", "os":["linux"]},
{"file":"install/pytorch-install", "os":["linux"]},
{"file":"install/tensorflow-install", "os":["linux"]},
{"file":"install/windows/install", "os":["windows"]},
{"file":"install/windows/prerequisites", "os":["windows"]},
{"file":"install/windows/cli/index", "os":["windows"]},
{"file":"install/windows/gui/index", "os":["windows"]},
{"file":"about/compatibility/docker-image-support-matrix", "os":["linux"]},
{"file":"about/compatibility/user-kernel-space-compat-matrix", "os":["linux"]},
{"file":"reference/library-index", "os":["linux"]},
{"file":"how-to/deep-learning-rocm", "os":["linux"]},
{"file":"how-to/gpu-enabled-mpi", "os":["linux"]},
{"file":"how-to/system-debugging", "os":["linux"]},
{"file":"how-to/tuning-guides", "os":["linux", "windows"]},
{"file":"rocm-a-z", "os":["linux", "windows"]},
{
"file": "how-to/llm-fine-tuning-optimization/multi-gpu-fine-tuning-and-inference",
"os": ["linux"],
},
{
"file": "how-to/llm-fine-tuning-optimization/llm-inference-frameworks",
"os": ["linux"],
},
{
"file": "how-to/llm-fine-tuning-optimization/model-acceleration-libraries",
"os": ["linux"],
},
{"file": "how-to/llm-fine-tuning-optimization/model-quantization", "os": ["linux"]},
{
"file": "how-to/llm-fine-tuning-optimization/optimizing-with-composable-kernel",
"os": ["linux"],
},
{
"file": "how-to/llm-fine-tuning-optimization/optimizing-triton-kernel",
"os": ["linux"],
},
{
"file": "how-to/llm-fine-tuning-optimization/profiling-and-debugging",
"os": ["linux"],
},
{"file": "how-to/system-optimization/index", "os": ["linux"]},
{"file": "how-to/system-optimization/mi300x", "os": ["linux"]},
{"file": "how-to/system-optimization/mi200", "os": ["linux"]},
{"file": "how-to/system-optimization/mi100", "os": ["linux"]},
{"file": "how-to/system-optimization/w6000-v620", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/index", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/system", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/workload", "os": ["linux"]},
{"file": "how-to/system-debugging", "os": ["linux"]},
{"file": "how-to/gpu-enabled-mpi", "os": ["linux"]},
]
exclude_patterns = ['temp']
external_toc_path = "./sphinx/_toc.yml"
extensions = ["rocm_docs", "sphinx_reredirects"]
@@ -99,14 +96,12 @@ html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}
html_static_path = ["sphinx/static/css"]
html_css_files = ["rocm_custom.css"]
html_css_files = ["rocm_custom.css", "rocm_rn.css"]
html_title = "ROCm Documentation"
html_theme_options = {
"link_main_doc": False
}
html_theme_options = {"link_main_doc": False}
redirects = {
"reference/openmp/openmp": "../../about/compatibility/openmp.html"
}
redirects = {"reference/openmp/openmp": "../../about/compatibility/openmp.html"}
numfig = False

View File

@@ -12,8 +12,7 @@ There are four standard ways to provide feedback on this repository.
All contributions to ROCm documentation should arrive via the
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
targeting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us at [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
targeting the develop branch of the repository.
For more in-depth information on creating a pull request (PR), see
[Contributing](./contributing.md).
@@ -30,7 +29,3 @@ and follow along on via public announcements.
Issues on existing or absent documentation can be filed in
[GitHub Issues](https://github.com/ROCm/ROCm/issues).
## Email
Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

View File

Before

Width:  |  Height:  |  Size: 153 KiB

After

Width:  |  Height:  |  Size: 153 KiB

View File

Before

Width:  |  Height:  |  Size: 219 KiB

After

Width:  |  Height:  |  Size: 219 KiB

View File

Before

Width:  |  Height:  |  Size: 80 KiB

After

Width:  |  Height:  |  Size: 80 KiB

View File

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 310 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 342 KiB

View File

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB

View File

Before

Width:  |  Height:  |  Size: 83 KiB

After

Width:  |  Height:  |  Size: 83 KiB

View File

Before

Width:  |  Height:  |  Size: 288 KiB

After

Width:  |  Height:  |  Size: 288 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

View File

@@ -8,48 +8,18 @@ Installing deep learning frameworks for ROCm
ROCm provides a comprehensive ecosystem for deep learning development, including
:ref:`libraries <artificial-intelligence-apis>` for optimized deep learning operations and ROCm-aware versions of popular
deep learning frameworks and libraries such as PyTorch, TensorFlow, JAX, and MAGMA. ROCm works closely with these
deep learning frameworks and libraries such as PyTorch, TensorFlow, and JAX. ROCm works closely with these
frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
The following guides cover installation processes for ROCm-aware deep learning frameworks.
.. grid::
.. grid-item::
:columns: 3
:doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
.. grid-item::
:columns: 3
:doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
:doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
.. grid-item::
:columns: 3
:doc:`MAGMA for ROCm <rocm-install-on-linux:how-to/3rd-party/magma-install>`
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
* :doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
.. image:: ../data/how-to/framework_install_2024_05_23.png
.. image:: ../data/how-to/framework_install_2024_07_04.png
:alt: Flowchart for installing ROCm-aware machine learning frameworks
:align: center

View File

@@ -28,18 +28,9 @@ graphs, tensor parallel multi-GPU, GPTQ, AWQ, and token speculation.
Installing vLLM
---------------
1. To install vLLM, run the following commands.
.. code-block:: shell
# Install from source
git clone https://github.com/ROCm/vllm.git
cd vllm
PYTORCH_ROCM_ARCH=gfx942 python setup.py install #MI300 series
.. _fine-tuning-llms-vllm-rocm-docker-image:
2. Run the following commands to build a Docker image ``vllm-rocm``.
1. Run the following commands to build a Docker image ``vllm-rocm``.
.. code-block:: shell
@@ -52,7 +43,7 @@ Installing vLLM
.. tab-item:: vLLM on a single-accelerator system
:sync: single
3. To use vLLM as an API server to serve reference requests, first start a container using the :ref:`vllm-rocm
2. To use vLLM as an API server to serve reference requests, first start a container using the :ref:`vllm-rocm
Docker image <fine-tuning-llms-vllm-rocm-docker-image>`.
.. code-block:: shell
@@ -69,7 +60,7 @@ Installing vLLM
vllm-rocm \
bash
4. Inside the container, start the API server to run on a single accelerator on port 8000 using the following command.
3. Inside the container, start the API server to run on a single accelerator on port 8000 using the following command.
.. code-block:: shell
@@ -81,6 +72,57 @@ Installing vLLM
:alt: vLLM API server log message
:align: center
4. To test, send it a curl request containing a prompt.
.. code-block:: shell
curl http://localhost:8000/generate -H "Content-Type: application/json" -d '{"prompt": "What is AMD Instinct?", "max_tokens": 80, "temperature": 0.0 }'
You should receive a response like the following.
.. code-block:: text
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
.. tab-item:: vLLM on a multi-accelerator system
:sync: multi
2. To use vLLM as an API server to serve reference requests, first start a container using the :ref:`vllm-rocm
Docker image <fine-tuning-llms-vllm-rocm-docker-image>`.
.. code-block:: shell
docker run -it \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v <path/to/model>:/app/model \
vllm-rocm \
bash
3. To run API server on multiple GPUs, use the ``-tp`` or ``--tensor-parallel-size`` parameter. For example, to use two
GPUs, start the API server using the following command.
.. code-block:: shell
python -m vllm.entrypoints.api_server --model /app/model --dtype float16 -tp 2 --port 8000 &
4. To run multiple instances of API Servers, specify different ports for each server, and use ``ROCR_VISIBLE_DEVICES`` to
isolate each instance to a different accelerator.
For example, to run two API servers, one on port 8000 using GPU 0 and 1, one on port 8001 using GPU 2 and 3, use a
a command like the following.
.. code-block:: shell
ROCR_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server --model /data/llama-2-7b-chat-hf --dtype float16 tp 2 --port 8000 &
ROCR_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --model /data/llama-2-7b-chat-hf --dtype float16 tp 2--port 8001 &
5. To test, send it a curl request containing a prompt.
.. code-block:: shell
@@ -92,57 +134,8 @@ Installing vLLM
.. code-block:: text
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
.. tab-item:: vLLM on a multi-accelerator system
:sync: multi
3. To use vLLM as an API server to serve reference requests, first start a container using the :ref:`vllm-rocm
Docker image <fine-tuning-llms-vllm-rocm-docker-image>`.
.. code-block:: shell
docker run -it \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v <path/to/model>:/app/model \
vllm-rocm \
bash
4. To run API server on multiple GPUs, use the ``-tp`` or ``--tensor-parallel-size`` parameter. For example, to use two
GPUs, start the API server using the following command.
.. code-block:: shell
python -m vllm.entrypoints.api_server --model /app/model --dtype float16 -tp 2 --port 8000 &
5. To run multiple instances of API Servers, specify different ports for each server, and use ``ROCR_VISIBLE_DEVICES`` to
isolate each instance to a different accelerator.
For example, to run two API servers, one on port 8000 using GPU 0 and 1, one on port 8001 using GPU 2 and 3, use a
a command like the following.
.. code-block:: shell
ROCR_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.api_server --model /data/llama-2-7b-chat-hf --dtype float16 tp 2 --port 8000 &
ROCR_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --model /data/llama-2-7b-chat-hf --dtype float16 tp 2--port 8001 &
6. To test, send it a curl request containing a prompt.
.. code-block:: shell
curl http://localhost:8000/generate -H "Content-Type: application/json" -d '{"prompt": "What is AMD Instinct?", "max_tokens": 80, "temperature": 0.0 }'
You should receive a response like the following.
.. code-block:: text
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
.. _fine-tuning-llms-tgi:
@@ -163,27 +156,29 @@ speculation.
Install TGI
-----------
1. To install the TGI Docker image, run the following commands.
1. Launch the TGI Docker container in the host machine.
.. code-block:: shell
# Install from Dockerfile
git clone https://github.com/huggingface/text-generation-inference.git -b mi300-compat
cd text-generation-inference
docker build . -f Dockerfile.rocm
docker run --name tgi --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
--device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g
--net host -v $PWD:/data
--entrypoint "/bin/bash"
--env HUGGINGFACE_HUB_CACHE=/data
ghcr.io/huggingface/text-generation-inference:latest-rocm
.. tab-set::
.. tab-item:: TGI on a single-accelerator system
:sync: single
2. Launch a model using TGI server on a single accelerator.
2. Inside the container, launch a model using TGI server on a single accelerator.
.. code-block:: shell
export ROCM_USE_FLASH_ATTN_V2_TRITON=True
text-generation-launcher --model-id NousResearch/Meta-Llama-3-70B --dtype float16 --port 8000 &
3. To test, send it a curl request containing a prompt.
.. code-block:: shell
@@ -191,26 +186,26 @@ Install TGI
curl http://localhost:8000/generate_stream -X POST -d '{"inputs":"What is AMD Instinct?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
You should receive a response like the following.
.. code-block:: shell
data:{"index":20,"token":{"id":304,"text":" in","logprob":-1.2822266,"special":false},"generated_text":" AMD Instinct is a new family of data center GPUs designed to accelerate the most demanding workloads in","details":null}
.. tab-item:: TGI on a multi-accelerator system
2. Launch a model using TGI server on multiple accelerators (4 in this case).
2. Inside the container, launch a model using TGI server on multiple accelerators (4 in this case).
.. code-block:: shell
export ROCM_USE_FLASH_ATTN_V2_TRITON=True
text-generation-launcher --model-id NousResearch/Meta-Llama-3-8B --dtype float16 --port 8000 --num-shard 4 &
3. To test, send it a curl request containing a prompt.
.. code-block:: shell
curl http://localhost:8000/generate_stream -X POST -d '{"inputs":"What is AMD Instinct?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
You should receive a response like the following.
.. code-block:: shell

View File

@@ -8,6 +8,8 @@ Model acceleration libraries
This section discusses model acceleration techniques and libraries to improve memory efficiency and performance.
.. _acceleration-flash-attention:
Flash Attention 2
=================

View File

@@ -154,11 +154,12 @@ kernels by configuring the ``exllama_config`` parameter as the following.
.. code-block:: python
from transformers import AutoModelForCausalLM, GPTQConfig
pretrained_model_dir = "meta-llama/Llama-2-7b"
gptq_config = GPTQConfig(bits=4, exllama_config={"version":2})
#pretrained_model_dir = "meta-llama/Llama-2-7b"
base_model_name = "NousResearch/Llama-2-7b-hf"
gptq_config = GPTQConfig(bits=4, dataset="c4", exllama_config={"version":2})
quantized_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map="auto",
base_model_name,
device_map="auto",
quantization_config=gptq_config)
bitsandbytes

View File

@@ -6,383 +6,24 @@
Optimizing Triton kernels
*************************
This section introduces the general steps for `Triton <https://openai.com/index/triton/>`_ kernel optimization. Broadly,
Triton kernel optimization is similar to HIP and CUDA kernel optimization.
This section introduces the general steps for
`Triton <https://openai.com/index/triton/>`_ kernel optimization. Broadly,
Triton kernel optimization is similar to :doc:`HIP <hip:how-to/performance_guidelines>`
and CUDA kernel optimization.
.. _fine-tuning-llms-triton-memory-access-efficiency:
Refer to the
:ref:`Triton kernel performance optimization <mi300x-triton-kernel-performance-optimization>`
section of the :doc:`/how-to/tuning-guides/mi300x/workload` guide
for detailed information.
Memory access efficiency
========================
Triton kernel performance optimization includes the following topics.
The accelerator or GPU contains global memory, local data share (LDS), and registers. Global memory has high access
latency, but is large. LDS access has much lower latency, but is smaller. Register access is the fastest yet smallest
among the three.
* :ref:`mi300x-autotunable-kernel-config`
So, the data in global memory should be loaded and stored as few times as possible. If different threads in a block
need to access the same data, these data should be first transferred from global memory to LDS, then accessed by
different threads in a workgroup.
* :ref:`mi300x-mlir-analysis`
.. _fine-tuning-llms-triton-hardware-resource-utilization:
* :ref:`mi300x-assembly-analysis`
Hardware resource utilization
=============================
* :ref:`mi300x-torchinductor-tuning`
Each accelerator or GPU has multiple Compute Units (CUs) and various CUs do computation in parallel. So, how many CUs
can a compute kernel can allocate its task to? For the :doc:`AMD MI300X accelerator <../../reference/gpu-arch-specs>`, the
grid should have at least 1024 thread blocks or workgroups.
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/compute-unit.png
Schematic representation of a CU in the CDNA2 or CDNA3 architecture.
To increase hardware utilization and maximize parallelism, it is necessary to design algorithms that can exploit more
parallelism. One approach to achieving this is by using larger split-K techniques for General Matrix Multiply (GEMM)
operations, which can further distribute the computation across more CUs, thereby enhancing performance.
.. tip::
You can query hardware resources with the command ``rocminfo`` (in the ``/opt/rocm/bin`` directory). For instance,
query the number of CUs, number of SIMD, and wavefront size using the following commands.
.. code-block:: shell
rocminfo | grep "Compute Unit"
rocminfo | grep "SIMD"
rocminfo | grep "Wavefront Size"
On an MI300X device, there are 304 CUs, 4 SIMD per CU, and the wavefront size (warp size) is 64. See :doc:`Hardware
specifications <../../reference/gpu-arch-specs>` for a full list of AMD accelerators and GPUs.
.. _fine-tuning-llms-triton-ir-analysis:
IR analysis
===========
In Triton, there are several layouts including *blocked*, *shared*, *sliced*, and *MFMA*.
From the Triton GPU IR (intermediate representation), you can know in which memory each computation is
performed. The following is a snippet of IR from the Flash Attention decode ``int4`` key-value program. It is to
de-quantize the ``int4`` key-value from the ``int4`` data type to ``fp16``.
.. code-block::
%190 = tt.load %189 {cache = 1 : i32, evict = 1 : i32, isVolatile =
false} : tensor<1x64xi32, #blocked6> loc(#loc159)
%266 = arith.andi %190, %cst_28 : tensor<1x64xi32, #blocked6>
loc(#loc250)
%267 = arith.trunci %266 : tensor<1x64xi32, #blocked6> to
tensor<1x64xi16, #blocked6> loc(#loc251)
%268 = tt.bitcast %267 : tensor<1x64xi16, #blocked6> -> tensor<1x64xf16,
#blocked6> loc(#loc252)
%269 = triton_gpu.convert_layout %268 : (tensor<1x64xf16, #blocked6>) ->
tensor<1x64xf16, #shared1> loc(#loc252)
%270 = tt.trans %269 : (tensor<1x64xf16, #shared1>) -> tensor<64x1xf16,
#shared2> loc(#loc194)
%276 = triton_gpu.convert_layout %270 : (tensor<64x1xf16, #shared2>) ->
tensor<64x1xf16, #blocked5> loc(#loc254)
%293 = arith.mulf %276, %cst_30 : tensor<64x1xf16, #blocked5>
loc(#loc254)
%295 = arith.mulf %292, %294 : tensor<64x32xf16, #blocked5> loc(#loc264)
%297 = arith.addf %295, %296 : tensor<64x32xf16, #blocked5> loc(#loc255)
%298 = triton_gpu.convert_layout %297 : (tensor<64x32xf16, #blocked5>)
-> tensor<64x32xf16, #shared1> loc(#loc255)
%299 = tt.trans %298 : (tensor<64x32xf16, #shared1>) ->
tensor<32x64xf16, #shared2> loc(#loc196)
%300 = triton_gpu.convert_layout %299 : (tensor<32x64xf16, #shared2>) ->
tensor<32x64xf16, #triton_gpu.dot_op<{opIdx = 1, parent = #mfma, kWidth
= 4}>> loc(#loc197)
From the IR, you can see ``i32`` data is loaded from global memory to registers. With a few element-wise operations in
registers, then it is stored in shared memory for the transpose operation, which needs data movement across different
threads. With the transpose done, it is loaded from LDS to register again, and with a few more element-wise operations,
they are stored in LDS again. The last step is to load from LDS to registers and convert to the dot-operand layout.
From the IR, you can see that it uses the LDS twice: one for the transpose, and the other to convert the blocked layout
to a dot-operand layout.
Assembly analysis
=================
In the ISA, ensure ``global_load_dwordx4`` is used, especially when the
load happens in a loop.
In most cases, the LDS load and store should use ``_b128`` as well to
minimize the number of LDS access instructions. Note that upstream (or backend) might not have ``_b128`` LDS read/write,
so it uses ``_b64``. For most cases, no matter if you use fork or upstream,
the LDS access should have ``_b64`` vector width.
The AMD ISA has the ``s_waitcnt`` instruction to synchronize the dependency
of memory access and computations. The ``s_waitcnt`` instruction can
have two signals, typically in the context of Triton:
* ``lgkmcnt(n):`` `lgkm` stands for LDS, GDS, Constant and Message.
In this context, it is often related to LDS access. The number ``n`` here means the number of such accesses that can
be left out to continue. For example, 0 means all ``lgkm`` access must finish before continuing, and 1 means only 1
``lgkm`` access can be still running asynchronously before proceeding.
* ``vmcnt(n):`` `vm` means vector memory.
This happens when vector memory is accessed, for example, when global load moves from global memory to vector memory.
Again, the number ``n`` here means the number of accesses that can be left out to continue.
Generally recommended guidelines are as follows.
* Vectorize memory access as much as possible.
* Ensure synchronization is done efficiently.
* Overlap of instructions to hide latency, but it requires thoughtful
analysis of the algorithms.
* If you find inefficiencies, you can trace it back to LLVM IR, TTGIR
and even TTIR to see where the problem comes from. If you find it
during compiler optimization, activate the MLIR dump and check which
optimization pass caused the problem.
.. _fine-tuning-llms-triton-kernel-occupancy:
Kernel occupancy
================
1. Get the VGPR count, search for ``.vgpr_count`` in the ISA (for example, ``N``).
2. Get the allocated LDS following the steps (for example, L for the kernel).
a. ``export MLIR_ENABLE_DUMP=1``
b. ``rm -rf ~/.triton/cache``
c. ``python kernel.py | | grep "triton_gpu.shared = " | tail -n 1``
d. You should see something like ``triton_gpu.shared = 65536``, indicating 65536 bytes of LDS are allocated for the
kernel.
3. Get number of waves per workgroup using the following steps (for example, ``nW``).
a. ``export MLIR_ENABLE_DUMP=1``
b. ``rm -rf ~/.triton/cache``
c. ``python kernel.py | | grep "triton_gpu.num-warps " | tail -n 1``
d. You should see something like ``“triton_gpu.num-warps" = 8``, indicating 8 waves per workgroup.
4. Compute occupancy limited by VGPR based on N according to the following table. For example, waves per EU as
``occ_vgpr``.
.. _fine-tuning-llms-occupancy-vgpr-table:
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/occupancy-vgpr.png
:alt: Occupancy related to VGPR usage in an Instinct MI300X accelerator.
:align: center
5. Compute occupancy limited by LDS based on L by: ``occ_lds = floor(65536 / L)``.
6. Then the occupancy is ``occ = min(floor(occ_vgpr * 4 / nW), occ_lds) * nW / 4``
a. ``occ_vgpr \* 4`` gives the total number of waves on all 4 execution units (SIMDs)
per CU.
b. ``floor(occ_vgpr * 4 / nW)`` gives the occupancy of workgroups per CU
regrading VGPR usage.
c. The true ``occ`` is the minimum of the two.
.. _fine-tuning-llms-triton-kernel-configs-env-vars:
Auto-tunable kernel configurations and environment variables
============================================================
This section relates to the amount of :ref:`memory access <fine-tuning-llms-triton-memory-access-efficiency>` and
computation assigned to each CU. It is related to the usage of LDS, registers and the scheduling of different tasks on
a CU.
The following is a list of kernel arguments used for tuning.
``num_stages=n``
Adjusts the number of pipeline stages for different types of kernels. On AMD accelerators, set ``num_stages``
according to the following rules:
* For kernels with a single GEMM, set to ``0``.
* For kernels with two GEMMs fused (Flash Attention, or any other kernel
that fuses 2 GEMMs), set to ``1``.
* For kernels that fuse a single GEMM with another non-GEMM operator
(for example ReLU activation), set to ``0``.
* For kernels that have no GEMMs, set to ``1``.
``waves_per_eu=n``
Helps to manage Vector General Purpose Registers (VGPR) usage to achieve desired occupancy levels. This argument
hints to the compiler to reduce VGPR to achieve ``n`` occupancy. See
:ref:`Kernel occupancy <fine-tuning-llms-triton-kernel-occupancy>` for more information about how to compute
occupancy.
This argument is useful if:
* The occupancy of the kernel is limited by VGPR usage.
* The current VGPR usage is only a few above a boundary in
:ref:`Occupancy related to VGPR usage in an Instinct MI300X accelerator <fine-tuning-llms-occupancy-vgpr-table>`.
For example, according to the table, the available VGPR is 512 per Execution Unit (EU), and VGPU is allocated at the
unit of 16. If the current VGPR usage is 170, the actual requested VGPR will be 176, so the
occupancy is only 2 waves per CU since :math:`176 \times 3 > 512`. So, if you set
``waves_per_eu`` to 3, the LLVM backend tries to bring VGPR usage down so
that it might fit 3 waves per EU.
``BLOCK_M``, ``BLOCK_N``, ``BLOCK_K``
Tile sizes to be tuned to balance the memory-to-computation ratio. You want tile sizes large enough to
maximize the efficiency of memory-to-computation ratio, but small enough to parallelize the greatest number of
workgroups at the grid level.
``matrix_instr_nonkdim``
Experimental feature for Flash Attention-like kernels that determines the size of the Matrix Fused Multiply-Add
(MFMA) instruction used.
- ``Matrix_instr_nonkdim = 16``: ``mfma_16x16`` is used.
- ``Matrix_instr_nonkdim = 32``: ``mfma_32x32`` is used.
For GEMM kernels on an AMD MI300X accelerator, ``mfma_16x16`` typically outperforms ``mfma_32x32``, even for large
tile/GEMM sizes.
The following is an environment variable used for tuning.
``OPTIMIZE_EPILOGUE``
Setting this variable to ``1`` can improve performance by removing the ``convert_layout`` operation in the epilogue.
It should be turned on (set to ``1``) in most cases. Setting ``OPTIMIZE_EPILOGUE=1`` stores the MFMA instruction
results in the MFMA layout directly; this comes at the cost of reduced global store efficiency, but the impact on
kernel execution time is usually minimal.
By default (``0``), the results of MFMA instruction are converted to blocked layout, which leads to ``global_store``
with maximum vector length, that is ``global_store_dwordx4``.
This is done implicitly with LDS as the intermediate buffer to achieve
data exchange between threads. Padding is used in LDS to avoid bank
conflicts. This usually leads to extra LDS usage, which might reduce
occupancy.
.. note::
This variable is not turned on by default because it only
works with ``tt.store`` but not ``tt.atomic_add``, which is used in split-k and
stream-k GEMM kernels. In the future, it might be enabled with
``tt.atomic_add`` and turned on by default.
See :ref:`IR analysis <fine-tuning-llms-triton-ir-analysis>`.
TorchInductor with Triton tuning knobs
===========================================
The following are suggestions for optimizing matrix multiplication (GEMM) and convolution (``conv``) operations in PyTorch
using ``inductor``, a part of the PyTorch compilation framework. The goal is to leverage Triton to achieve better
performance.
Learn more about TorchInductor environment variables and usage in
`PyTorch documentation <https://pytorch.org/docs/2.3/torch.compiler_inductor_profiling.html>`_.
To enable a ``gemm``/``conv`` lowering to Triton, it requires use of ``inductor``s ``max_autotune`` mode. This benchmarks a
static list of Triton configurations (``conv`` configurations for max auto-tune + ``matmul`` configurations for max
auto-tune) and uses the fastest for each shape. Note that the Triton is not used if regular :doc:`MIOpen <miopen:index>`
or :doc:`rocBLAS <rocblas:index>` is faster for a specific operation.
* Set ``torch._inductor.config.max_autotune = True`` or ``TORCHINDUCTOR_MAX_AUTOTUNE=1``.
* Or, for more fine-grained control:
``torch._inductor.config.max_autotune.pointwise = True``
To enable tuning for ``pointwise``/``reduction`` ops.
``torch._inductor.config.max_autotune_gemm = True``
To enable tuning or lowering of ``mm``/``conv``\s.
``torch._inductor.max_autotune_gemm_backends/TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS``
To select the candidate backends for ``mm`` auto-tuning. Defaults to
``TRITON,ATEN,NV``. This also includes the ``CUTLASS`` tuning option. Limiting this to
``TRITON`` might improve performance by enabling more fused ``mm`` kernels
instead of going to rocBLAS.
* For ``mm`` tuning, tuning ``coordinate_descent`` might improve performance.
``torch._inductor.config.coordinate_descent_tuning = True`` or ``TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1``
* Inference can see large improvements on AMD GPUs by utilizing
``torch._inductor.config.freezing=True`` or the ``TORCHINDUCTOR_FREEZING=1`` variable, which
in-lines weights as constants and enables constant folding optimizations.
* Enabling ``inductor``s cpp_wrapper might improve overhead. This generates
C++ code which launches Triton binaries directly with
``hipModuleLaunchKernel`` and relies on `hipification`.
* For NHWC convolutions workloads
``torch._inductor.config.layout_optimization=True`` or ``TORCHINDUCTOR_LAYOUT_OPTIMIZATION=``
can help be enforcing channels_last format throughout the graph avoiding
any additional transposes added by ``inductor``. Note that
``PYTORCH_MIOPEN_SUGGEST_NHWC=1`` is recommended if using this.
* Extracting the Triton kernel ``TORCH_COMPILE_DEBUG`` creates a
``torch_compile_debug/`` directory at current path, in the ``output_code.py``
the code-strings for the Triton kernels that are defined. Manual work is
then required to strip out the kernel and create kernel
compilation and launch via Triton.
* For advanced ``matmul`` or ``conv`` configuration tuning, the ``inductor-gemm-tuner`` can
help. This implements the Triton ``conv``/``mm`` implementations used upstream
and allows specification of inputs and configuration tuning search space if new
tunings are found that can be added to the auto-tune list.
Other guidelines
================
* Performance-critical HIP provides an environment variable, ``export HIP_FORCE_DEV_KERNARG=1``,
that can put HIP kernel arguments directly to
device memory to reduce the latency of accessing kernel arguments. It
can reduce 2 to 3 μs for some kernels. Setting this variable for the FA
decode containing ``splitK`` and reduced kernels can reduce the total time
by around 6 μs in the benchmark test.
* Set the clock to deterministic. Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock speed to
1900MHz instead of the default 2100MHz. This can reduce the chance of clock speed decrease due to chip high temperature
by setting a lower cap. You can restore this setting to its default value with ``rocm-smi -r``.
* Set Non-Uniform Memory Access (NUMA) auto-balance. Run the command ``cat /proc/sys/kernel/numa_balancing`` to check the
current setting. An output of ``0`` indicates this setting is available. If output is ``1``, run the command
``sudo sh -c \\'echo 0 > /proc/sys/kernel/numa_balancing`` to set this.
For these settings, the ``env_check.sh`` script automates the setting, resetting, and checking of the such
environments. Find the script at `<https://github.com/ROCm/triton/blob/rocm_env/scripts/amd/env_check.sh>`__.
.. _fine-tuning-llms-triton-tunableop:
TunableOp
---------
`TunableOp <https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/tunable/README.md>`_
is a feature used to define and optimize kernels that can have tunable parameters. This is useful in
optimizing the performance of custom kernels by exploring different parameter configurations to find the most efficient
setup. See more about PyTorch TunableOp :ref:`Model acceleration libraries <fine-tuning-llms-pytorch-tunableop>`.
You can easily manipulate the behavior TunableOp through environment variables, though you could use the C++ interface
``at::cuda::tunable::getTuningContext()``. A Python interface to the ``TuningContext`` does not yet exist.
The default value is ``0``, which means only 1 iteration is attempted. Remember: theres an overhead to tuning. To try
and minimize the overhead, only a limited number of iterations of a given operation are attempted. If you set this to
``10``, each solution for a given operation can run as many iterations as possible within 10ms. There is a hard-coded
upper limit of 100 iterations attempted per solution. This is a tuning parameter; if you want the tunings to be chosen
based on an average over multiple iterations, increase the allowed tuning duration.
* :ref:`mi300x-compute-kernel-occ`

View File

@@ -6,7 +6,7 @@
# Optimizing with Composable Kernel
The AMD ROCm&trade; Composable Kernel (CK) library provides a programming model for writing performance-critical kernels for machine learning workloads. It generates a general-purpose kernel during the compilation phase through a C++ template, enabling developers to achieve operation fusions on different data precisions.
The AMD ROCm Composable Kernel (CK) library provides a programming model for writing performance-critical kernels for machine learning workloads. It generates a general-purpose kernel during the compilation phase through a C++ template, enabling developers to achieve operation fusions on different data precisions.
This article gives a high-level overview of CK General Matrix Multiplication (GEMM) kernel based on the design example of `03_gemm_bias_relu`. It also outlines the steps to construct the kernel and run it. Moreover, the article provides a detailed implementation of running SmoothQuant quantized INT8 models on AMD Instinct MI300X accelerators using CK.

View File

@@ -6,212 +6,24 @@
Profiling and debugging
***********************
This section discusses profiling and debugging tools and some of their common usage patterns with ROCm applications.
This section provides an index for further documentation on profiling and
debugging tools and their common usage patterns.
PyTorch Profiler
================
See :ref:`AMD Instinct MI300X™ workload optimization <mi300x-profiling-start>`
for a conceptual summary of the workload profiling workflow for ROCm applications
on AMD hardware -- including fine-tuning LLMs.
`PyTorch Profiler <https://pytorch.org/docs/stable/profiler.html>`_ can be invoked inside Python scripts, letting you
collect CPU and GPU performance metrics while the script is running. See the `PyTorch Profiler tutorial
<https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html>`_ for more information.
There, you'll find information on higher-level and kernel-level profiling tools
as well as other profiling and debugging suggestions.
You can then visualize and view these metrics using an open-source profile visualization tool like
`Perfetto UI <https://ui.perfetto.dev>`_.
* :ref:`PyTorch Profiler <mi300x-pytorch-profiler>`
#. Use the following snippet to invoke PyTorch Profiler in your code.
* :ref:`ROCm profiling tools <mi300x-profiling-tools>`
.. code-block:: python
* :ref:`ROCProfiler <mi300x-rocprof>`
import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity
model = models.resnet18().cuda()
inputs = torch.randn(2000, 3, 224, 224).cuda()
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
with record_function("model_inference"):
model(inputs)
prof.export_chrome_trace("resnet18_profile.json")
* :ref:`Omniperf <mi300x-omniperf>`
#. Profile results in ``resnet18_profile.json`` can be viewed by the Perfetto visualization tool. Go to
`<https://ui.perfetto.dev>`__ and import the file. In your Perfetto visualization, you'll see that the upper section
shows transactions denoting the CPU activities that launch GPU kernels while the lower section shows the actual GPU
activities where it processes the ``resnet18`` inferences layer by layer.
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/perfetto-trace.svg
Perfetto trace visualization example.
ROCm profiling tools
====================
Heterogenous systems, where programs run on both CPUs and GPUs, introduce additional complexities. Understanding the
critical path and kernel execution is all the more important; so, performance tuning is a necessary component in the
benchmarking process.
With AMD's profiling tools, developers are able to gain important insight into how efficiently their application is
using hardware resources and effectively diagnose potential bottlenecks contributing to poor performance. Developers
working with AMD Instinct accelerators have multiple tools depending on their specific profiling needs; these are:
* :ref:`ROCProfiler <fine-tuning-llms-profiling-rocprof>`
* :ref:`Omniperf <fine-tuning-llms-profiling-omniperf>`
* :ref:`Omnitrace <fine-tuning-llms-profiling-omnitrace>`
.. _fine-tuning-llms-profiling-rocprof:
ROCProfiler
-----------
:doc:`ROCProfiler <rocprofiler:index>` is primarily a low-level API for accessing and extracting GPU hardware performance
metrics, commonly called *performance counters*. These counters quantify the performance of the underlying architecture
showcasing which pieces of the computational pipeline and memory hierarchy are being utilized.
Your ROCm installation contains a script or executable command called ``rocprof`` which provides the ability to list all
available hardware counters for your specific accelerator or GPU, and run applications while collecting counters during
their execution.
This ``rocprof`` utility also depends on the :doc:`ROCTracer and ROC-TX libraries <roctracer:index>`, giving it the
ability to collect timeline traces of the accelerator software stack as well as user-annotated code regions.
.. note::
``rocprof`` is a CLI-only utility so input and output takes the format of ``.txt`` and CSV files. These
formats provide a raw view of the data and puts the onus on the user to parse and analyze. Therefore, ``rocprof``
gives the user full access and control of raw performance profiling data, but requires extra effort to analyze the
collected data.
.. _fine-tuning-llms-profiling-omniperf:
Omniperf
--------
`Omniperf <https://rocm.github.io/omniperf>`_ is a system performance profiler for high-performance computing (HPC) and
machine learning (ML) workloads using Instinct accelerators. Under the hood, Omniperf uses
:ref:`ROCProfiler <fine-tuning-llms-profiling-rocprof>` to collect hardware performance counters. The Omniperf tool performs
system profiling based on all approved hardware counters for Instinct
accelerator architectures. It provides high level performance analysis features including System Speed-of-Light, IP
block Speed-of-Light, Memory Chart Analysis, Roofline Analysis, Baseline Comparisons, and more.
Omniperf takes the guesswork out of profiling by removing the need to provide text input files with lists of counters
to collect and analyze raw CSV output files as is the case with ROC-profiler. Instead, Omniperf automates the collection
of all available hardware counters in one command and provides a graphical interface to help users understand and
analyze bottlenecks and stressors for their computational workloads on AMD Instinct accelerators.
.. note::
Omniperf collects hardware counters in multiple passes, and will therefore re-run the application during each pass
to collect different sets of metrics.
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omniperf-analysis.png
Omniperf memory chat analysis panel.
In brief, Omniperf provides details about hardware activity for a particular GPU kernel. It also supports both
a web-based GUI or command-line analyzer, depending on your preference.
.. _fine-tuning-llms-profiling-omnitrace:
Omnitrace
---------
`Omnitrace <https://rocm.github.io/omnitrace>`_ is a comprehensive profiling and tracing tool for parallel applications,
including HPC and ML packages, written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU and
GPU. It is capable of gathering the performance information of functions through any combination of binary
instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks.
Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level
summary profiles with ``mean/min/max/stddev`` statistics. Beyond runtime
information, Omnitrace supports the collection of system-level metrics such as CPU frequency, GPU temperature, and GPU
utilization. Process and thread level metrics such as memory usage, page faults, context switches, and numerous other
hardware counters are also included.
.. tip::
When analyzing the performance of an application, it is best not to assume you know where the performance
bottlenecks are and why they are happening. Omnitrace is the ideal tool for characterizing where optimization would
have the greatest impact on the end-to-end execution of the application and to discover what else is happening on the
system during a performance bottleneck.
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omnitrace-timeline.png
Omnitrace timeline trace example.
For details usage and examples of using these tools, refer to the
`Introduction to profiling tools for AMD hardware <https://rocm.blogs.amd.com/software-tools-optimization/profilers/README.html>`_
developer blog.
Debugging with ROCm Debug Agent
===============================
ROCm Debug Agent (:doc:`ROCdebug-agent <rocr_debug_agent:index>`) is a library that can be loaded by the ROCm platform
runtime (:doc:`ROCr <rocr-runtime:index>`) to provide the following functionalities for all AMD accelerators and GPUs
supported by the ROCm Debugger API (:doc:`ROCdbgapi <rocdbgapi:index>`).
* Print the state of all AMD accelerator or GPU wavefronts that caused a queue error; for example, causing a memory
violation, executing an ``s_trap2``, or executing an illegal instruction.
* Print the state of all AMD accelerator or GPU wavefronts by sending a ``SIGQUIT`` signal to the process in question;
for example, by pressing ``Ctrl + \`` while the process is executing.
Debugging memory access faults
------------------------------
Identifying a faulting kernel is often enough to triage a memory access fault. To that end, the
`ROCm Debug Agent <https://github.com/ROCm/rocr_debug_agent/>`_ can trap a memory access fault and provide a dump of all
active wavefronts that caused the error as well as the name of the kernel. The
`AMD ROCm Debug Agent Library README <https://github.com/ROCm/rocr_debug_agent/blob/master/README.md>`_ provides full
instructions, but in brief:
* Compiling with ``-ggdb -O0`` is recommended but not required.
* ``HSA_TOOLS_LIB=/opt/rocm/lib/librocm-debug-agent.so.2 HSA_ENABLE_DEBUG=1 ./my_program``
When the debug agent traps the fault, it will produce an extremely
verbose output of all wavefront registers and memory content.
Importantly, it also prints something like:
.. code-block:: shell
Disassembly for function vector_add_assert_trap(int*, int*, int*):
code object:
file:////rocm-debug-agent/build/test/rocm-debug-agent-test#offset=14309&size=31336
loaded at: [0x7fd4f100c000-0x7fd4f100e070]
The kernel name and the code object file should be listed. In the
example above, the kernel name is ``vector_add_assert_trap``, but this might
also look like:
.. code-block:: shell
Disassembly for function memory:///path/to/codeobject#offset=1234&size=567:
In this case, it is an in-memory kernel that was generated at runtime.
Using the following environment variable, the debug agent will save all code objects to the current directory (use
``--save-code-objects=[DIR]`` to place them in another location). The code objects will be renamed from the URI format
with special characters replaced by ``_``.
.. code-block:: shell
ROCM_DEBUG_AGENT_OPTIONS="--all --save-code-objects"
Use the ``llvm-objdump`` command to disassemble the indicated in-memory
code object that has now been saved to disk. The name of the kernel is
often found inside the disassembled code object.
.. code-block:: shell
llvm-objdump --disassemble-all path/to/code-object.co
Consider turning off memory caching strategies both within the ROCm
stack and PyTorch where possible. This will give the debug agent the
best chance at finding the memory fault where it originates. Otherwise,
it could be masked by writing past the end of a cached block within a
larger allocation.
.. code-block:: shell
PYTORCH_NO_HIP_MEMORY_CACHING=1
HSA_DISABLE_FRAGMENT_ALLOCATOR=1
* :ref:`Omnitrace <mi300x-omnitrace>`
* :ref:`ROCr Debug Agent <mi300x-rocr-debug-agent>`

View File

@@ -45,7 +45,7 @@ Setting up the base implementation environment
.. code-block:: shell
rocm-smi -showproductname
rocm-smi --showproductname
Your output should look like this:

View File

@@ -21,3 +21,6 @@ In this guide, you'll learn about:
- :doc:`Running models from Hugging Face <hugging-face-models>`
- :doc:`Deploying your model <deploy-your-model>`
To learn about ROCm for HPC applications and scientific computing, see
:doc:`../rocm-for-hpc/index`.

View File

@@ -137,4 +137,4 @@ The following developer blogs showcase examples of how to fine-tune a model on a
* Recipes for fine-tuning Llama2 and 3 with ``llama-recipes``
* `meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover
single/multi-node GPUs <https://github.com/meta-llama/llama-recipes/tree/main/recipes/finetuning>`_
single/multi-node GPUs <https://github.com/meta-llama/llama-recipes/tree/main/recipes/quickstart/finetuning>`_

View File

@@ -0,0 +1,231 @@
.. meta::
:description: How to use ROCm for HPC
:keywords: ROCm, AI, high performance computing, HPC
******************
Using ROCm for HPC
******************
The ROCm open-source software stack is optimized to extract high-performance
computing (HPC) workload performance from AMD Instinct™ accelerators
while maintaining compatibility with industry software frameworks.
ROCm enhances support and access for developers by providing streamlined and
improved tools that significantly increase productivity. Being open-source, ROCm
fosters innovation, differentiation, and collaboration within the developer
community, making it a powerful and accessible solution for leveraging the full
potential of AMD accelerators' capabilities in diverse computational
applications.
* For more information, see :doc:`What is ROCm? <../../what-is-rocm>`.
* For guidance on installing ROCm, see :doc:`rocm-install-on-linux:index`. See
the :doc:`../../compatibility/compatibility-matrix` for details on hardware
and operating system support.
Some of the most popular HPC frameworks are part of the ROCm platform, including
those to help parallelize operations across multiple accelerators and servers,
handle memory hierarchies, and solve linear systems.
.. image:: ../../data/how-to/rocm-for-hpc/hpc-stack-2024_6_20.png
:align: center
:alt: Software and hardware ecosystem surrounding ROCm and AMD Instinct for HPC
The following catalog of GPU-accelerated solutions includes a vast set of
platform-compatible HPC applications, including those for astrophysics, climate
and weather, computational chemistry, computational fluid dynamics, earth
science, genomics, geophysics, molecular dynamics, and physics computing.
Refer to the resources in the following table for instructions on building,
running, and deploying these applications on ROCm-capable systems with AMD
Instinct accelerators. Each build container provides parameters to specify
different source code branches, release versions of ROCm, OpenMPI, UCX, and
Ubuntu versions.
.. _hpc-apps:
..
Reduce font size of HPC app descriptions slightly.
.. raw:: html
<style>
#hpc-apps-table tr td:last-child {
font-size: 0.9rem;
}
</style>
.. container::
:name: hpc-apps-table
.. list-table::
:header-rows: 1
:stub-columns: 1
:widths: 2 2 5
* - Application domain
- HPC application
- Description
* - Physics
- `Chroma <https://github.com/amd/InfinityHub-CI/tree/main/chroma/>`_
- The Chroma package supports data-parallel programming constructs for lattice
field theory and in particular lattice QCD. It uses the SciDAC QDP++ data-parallel
programming (in C++) that presents a single high-level code image to the user,
but can generate highly optimized code for many architectural systems including
single node workstations, multi and many-core nodes, clusters of nodes via
QMP, and classic vector computers.
* -
- `Grid <https://github.com/amd/InfinityHub-CI/tree/main/grid/>`_
- Grid is a library for lattice QCD calculations that employs a high-level data parallel
approach while using a number of techniques to target multiple types of parallelism.
The library currently supports MPI, OpenMP and short vector parallelism. The SIMD
instructions sets covered include SSE, AVX, AVX2, FMA4, IMCI and AVX512. Recent
releases expanded this support to include GPU offloading.
* -
- `MILC <https://github.com/amd/InfinityHub-CI/tree/main/milc/>`_
- The MILC Code is a set of research codes developed by MIMD Lattice Computation
(MILC) collaboration for doing simulations of four dimensional SU(3) lattice gauge
theory on MIMD parallel machines scaling from single-processor workstations
to HPC systems. The MILC Code is publicly available for research purposes.
Publications of work done using this code or derivatives of this code should
acknowledge this use.
* -
- `PIConGPU <https://github.com/amd/InfinityHub-CI/tree/main/picongpu>`_
- PIConGPU (Particle-in-cell on Graphics Processing Units) is an Open Source
simulations framework for plasma and laser-plasma physics used to develop
advanced particle accelerators for radiation therapy of cancer, high energy
physics and photon science.
* - Astrophysics
- `Cholla <https://github.com/amd/InfinityHub-CI/tree/main/cholla/>`_
- An astrophysical simulation code developed for the extreme environments
encountered in astrophysical systems.
* - Geophysics
- `SPECFEM3D Cartesian <https://github.com/amd/InfinityHub-CI/tree/main/specfem3d>`_
- SPECFEM3D Cartesian simulates acoustic (fluid), elastic (solid), coupled
acoustic/elastic, poroelastic or seismic wave propagation in any type of
conforming mesh of hexahedra (structured or not.) It can, for instance,
model seismic waves propagating in sedimentary basins or any other
regional geological model following earthquakes. It can also be used
for non-destructive testing or for ocean acoustics.
* - Molecular dynamics
- `GROMACS with HIP (AMD implementation) <https://github.com/amd/InfinityHub-CI/tree/main/gromacs>`_
- GROMACS is a versatile package to perform molecular dynamics, i.e.
simulate the Newtonian equations of motion for systems with hundreds
to millions of particles. This AMD container is based on a released
version of GROMACS modified by AMD. This container only supports up
to a 8 GPU configuration
* -
- `LAMMPS <https://github.com/amd/InfinityHub-CI/tree/main/lammps>`_
- LAMMPS is a classical molecular dynamics code with a focus on materials
modeling. It's an acronym for Large-scale Atomic/Molecular Massively
Parallel Simulator.
* - Computational fluid dynamics
- `NEKO <https://github.com/amd/InfinityHub-CI/tree/main/neko>`_
- Neko is a portable framework for high-order spectral element flow simulations.
Written in modern Fortran, Neko adopts an object-oriented approach, allowing
multi-tier abstractions of the solver stack and facilitating various hardware
backends ranging from general-purpose processors, CUDA and HIP enabled
accelerators to SX-Aurora vector processors.
* -
- `nekRS <https://github.com/amd/InfinityHub-CI/tree/main/nekrs>`_
- nekRS is an open-source Navier Stokes solver based on the spectral element
method targeting classical processors and accelerators like GPUs.
* - Computational chemistry
- `QUDA <https://github.com/amd/InfinityHub-CI/tree/main/quda>`_
- Library designed for efficient lattice QCD computations on
accelerators. It includes optimized Dirac operators and a variety of
fermion solvers and conjugate gradient (CG) implementations, enhancing
performance and accuracy in lattice QCD simulations.
* - Electronic structure
- `CP2K <https://github.com/amd/InfinityHub-CI/tree/main/cp2k>`_
- CP2K is a quantum chemistry and solid state physics software package that can
perform atomistic simulations of solid state, liquid, molecular, periodic, material,
crystal, and biological systems. This AMD container, based on a released version
of CP2K, is an AMD beta version with ongoing optimizations.
* - Quantum Monte Carlo Simulation
- `QMCPACK <https://github.com/amd/InfinityHub-CI/tree/main/qmcpack>`_
- QMCPACK is an open-source production-level many-body ab initio Quantum
Monte Carlo code for computing the electronic structure of atoms, molecules, 2D
nanomaterials and solids. The solid-state capabilities include metallic systems
as well as insulators. QMCPACK is expected to run well on workstations through
to the latest generation supercomputers. Besides high performance, particular
emphasis is placed on code quality and reproducibility.
* - Climate and weather
- `MPAS <https://github.com/amd/InfinityHub-CI/tree/main/mpas>`_
- The Model for Prediction Across Scales (MPAS) is a collaborative project for
developing atmosphere, ocean, and other earth-system simulation components
for use in climate, regional climate, and weather studies.
* - Benchmark
- `rocHPL <https://github.com/amd/InfinityHub-CI/tree/main/rochpl>`_
- HPL, or High-Performance Linpack, is a benchmark which solves a uniformly
random system of linear equations and reports floating-point execution rate.
This documentation supports the implementation of the HPL benchmark on
top of AMD's ROCm platform.
* -
- `rocHPL-MxP <https://github.com/amd/InfinityHub-CI/tree/main/hpl-mxp>`_
- Benchmark that highlights the convergence of HPC and AI workloads by
solving a system of linear equations using novel, mixed-precision
algorithms.
* -
- `HPCG <https://github.com/amd/InfinityHub-CI/tree/main/hpcg>`_
- HPCG, or the High Performance Conjugate Gradient Benchmark complements
the High Performance LINPACK (HPL) benchmark. The computational and data
access patterns of HPCG are designed to closely match a broad set of important
applications not represented by HPL, and to incentivize computer system
designers to invest in capabilities that will benefit the collective performance
of these applications.
* - Tools and libraries
- `ROCm with GPU-aware MPI container <https://github.com/amd/InfinityHub-CI/tree/main/base-gpu-mpi-rocm-docker>`_
- Base container for GPU-aware MPI with ROCm for HPC applications. This
project provides a boilerplate for building and running a Docker
container with ROCm supporting GPU-aware MPI implementations using
OpenMPI or UCX.
* -
- `Kokkos <https://github.com/amd/InfinityHub-CI/tree/main/kokkos>`_
- Kokkos is a programming model in C++ for writing performance portable
applications for use across HPC platforms. It provides abstractions for both
parallel execution of code and data management. Kokkos is designed to target
complex node architectures with N-level memory hierarchies and multiple types
of execution resources.
* -
- `PyFR <https://github.com/amd/InfinityHub-CI/tree/main/pyfr>`_
- PyFR is an open-source Python based framework for solving advection-diffusion
type problems on streaming architectures using the Flux Reconstruction approach of
Huynh. The framework is designed to solve a range of governing systems on mixed
unstructured grids containing various element types. It is also designed to target a
range of hardware platforms via use of an in-built domain specific language derived
from the Mako templating engine.
* -
- `RAJA <https://github.com/amd/InfinityHub-CI/tree/main/raja>`_
- RAJA is a library of C++ software abstractions, primarily developed at Lawrence
Livermore National Laboratory (LLNL), that enables architecture and programming
model portability for HPC applications.
* -
- `Trilinos <https://github.com/amd/InfinityHub-CI/tree/main/trilinos>`_
- The Trilinos Project is an effort to develop algorithms and enabling technologies
within an object-oriented software framework for the solution of large-scale,
complex multi-physics engineering and scientific problems.
To learn about ROCm for AI applications, see :doc:`../rocm-for-ai/index`.

View File

@@ -0,0 +1,42 @@
.. meta::
:description: Setting the number of CUs
:keywords: CU, CUs, number of CUs, compute units
.. _settings-cus-reference:
*************************************************************
Setting the number of compute units
*************************************************************
The GPU driver provides two environment variables to set the number of CUs used:
- ``HSA_CU_MASK``
- ``ROC_GLOBAL_CU_MASK``
The ``ROC_GLOBAL_CU_MASK`` variable sets the CU mask on queues created by HIP or OpenCL runtimes. The ``HSA_CU_MASK`` variable sets the mask on a lower level of queue creation in the driver. It also sets the mask on the queues being profiled.
.. tip::
When using GPUs to accelerate compute workloads, it sometimes becomes necessary to configure the hardware's usage of compute units (CU). This is a more advanced option, so please read this page before experimentation.
The environment variables have the following syntax:
::
ID = [0-9][0-9]* ex. base 10 numbers
ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7
GPU_list = ID_list ex. 0,2-4,7
CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7
CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. The listed or masked CUs are enabled for listed GPUs, and the others are disabled. Unlisted GPUs are not be affected, and their CUs are enabled.
The variable parsing stops when a syntax error occurs. The erroneous set and the following are ignored. Repeating GPU or CU IDs results in a syntax error. Specifying a mask with no usable CUs (CU_list is 0x0) results in a syntax error. To exclude GPU devices, use ``ROCR_VISIBLE_DEVICES``.
.. note::
These environment variables only affect ROCm software, not graphics applications.
Not all CU configurations are valid on all devices. For example, on devices where two CUs can be combined into a WGP (for kernels running in WGP mode), its not valid to disable only a single CU in a WGP. For more information about what to expect when disabling CUs, see the `Exploring AMD GPU Scheduling Details by Experimenting With “Worst Practices” <https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ paper.

View File

@@ -5,7 +5,7 @@
ROCm">
</head>
# System debugging guide
# System debugging
## ROCm language and system-level debug, flags, and environment variables
@@ -65,4 +65,4 @@ Debug messages when developing/debugging base ROCm driver. You could enable the
## PCIe-debug
For information on how to debug and profile HIP applications, see {doc}`hip:how_to_guides/debugging`
For information on how to debug and profile HIP applications, see {doc}`hip:how-to/debugging`

View File

@@ -0,0 +1,109 @@
.. meta::
:description: AMD hardware optimization for specific workloads
:keywords: high-performance computing, HPC, Instinct accelerators, Radeon,
tuning, tuning guide, AMD, ROCm
*******************
System optimization
*******************
This guide outlines system setup and tuning suggestions for AMD hardware to
optimize performance for specific types of workloads or use-cases.
High-performance computing workloads
====================================
High-performance computing (HPC) workloads have unique requirements. The default
hardware and BIOS configurations for OEM platforms may not provide optimal
performance for HPC workloads. To enable optimal HPC settings on a per-platform
and per-workload level, this chapter describes:
* BIOS settings that can impact performance
* Hardware configuration best practices
* Supported versions of operating systems
* Workload-specific recommendations for optimal BIOS and operating system
settings
There is also a discussion on the AMD Instinct™ software development
environment, including information on how to install and run the DGEMM, STREAM,
HPCG, and HPL benchmarks. This guide provides a good starting point but is
not tested exhaustively across all compilers.
Knowledge prerequisites to better understand this document and to perform tuning
for HPC applications include:
* Experience in configuring servers
* Administrative access to the server's Management Interface (BMC)
* Administrative access to the operating system
* Familiarity with the OEM server's BMC (strongly recommended)
* Familiarity with the OS specific tools for configuration, monitoring, and
troubleshooting (strongly recommended)
This document provides guidance on tuning systems with various AMD Instinct
accelerators for HPC workloads. The following sections don't comprise an
all-inclusive guide, and some items referred to may have similar, but different,
names in various OEM systems (for example, OEM-specific BIOS settings). This
following sections also provide suggestions on items that should be the initial
focus of additional, application-specific tuning.
While this guide is a good starting point, developers are encouraged to perform
their own performance testing for additional tuning.
.. list-table::
:header-rows: 1
:stub-columns: 1
* - System optimization guide
- Architecture reference
- White papers
* - :doc:`AMD Instinct MI300X <mi300x>`
- `AMD Instinct MI300 instruction set architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf>`_
- `CDNA 3 architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf>`_
* - :doc:`AMD Instinct MI200 <mi200>`
- `AMD Instinct MI200 instruction set architecture <https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf>`_
- `CDNA 2 architecture <https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf>`_
* - :doc:`AMD Instinct MI100 <mi100>`
- `AMD Instinct MI100 instruction set architecture <https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf>`_
- `CDNA architecture <https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf>`_
Workstation workloads
=====================
Workstation workloads, much like those for HPC, have a unique set of
requirements: a blend of both graphics and compute, certification, stability and
others.
The document covers specific software requirements and processes needed to use
these GPUs for Single Root I/O Virtualization (SR-IOV) and machine learning
tasks.
The main purpose of this document is to help users utilize the RDNA™ 2 GPUs to
their full potential.
.. list-table::
:header-rows: 1
:stub-columns: 1
* - System optimization guide
- Architecture reference
- White papers
* - :doc:`AMD Radeon PRO W6000 and V620 <w6000-v620>`
- `AMD RDNA 2 instruction set architecture <https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf>`_
- `RDNA 2 architecture <https://www.amd.com/system/files/documents/rdna2-explained-radeon-pro-W6000.pdf>`_

View File

@@ -1,11 +1,11 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="MI100 high-performance computing and tuning guide">
<meta name="keywords" content="MI100, high-performance computing, HPC, tuning, BIOS
<meta name="keywords" content="MI100, high-performance computing, HPC, BIOS
settings, NBIO, AMD, ROCm">
</head>
# MI100 high-performance computing and tuning guide
# AMD Instinct MI100 system optimization
## System settings

View File

@@ -1,11 +1,11 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="MI200 high-performance computing and tuning guide">
<meta name="keywords" content="MI200, high-performance computing, HPC, tuning, BIOS
<meta name="keywords" content="MI200, high-performance computing, HPC, BIOS
settings, NBIO, AMD, ROCm">
</head>
# MI200 high-performance computing and tuning guide
# AMD Instinct MI200 system optimization
## System settings

View File

@@ -0,0 +1,804 @@
.. meta::
:description: AMD Instinct MI300X system settings
:keywords: AMD, Instinct, MI300X, HPC, tuning, BIOS settings, NBIO, ROCm,
environment variable, performance, accelerator, GPU, EPYC, GRUB,
operating system
***************************************
AMD Instinct MI300X system optimization
***************************************
This document covers essential system settings and management practices required
to configure your system effectively. Ensuring that your system operates
correctly is the first step before delving into advanced performance tuning.
The main topics of discussion in this document are:
* :ref:`System settings <mi300x-system-settings>`
* :ref:`System BIOS settings <mi300x-bios-settings>`
* :ref:`GRUB settings <mi300x-grub-settings>`
* :ref:`Operating system settings <mi300x-os-settings>`
* :ref:`System management <mi300x-system-management>`
.. _mi300x-system-settings:
System settings
===============
This guide discusses system settings that are required to configure your system
for AMD Instinct™ MI300X accelerators. It is important to ensure a system is
functioning correctly before trying to improve its overall performance. In this
section, the settings discussed mostly ensure proper functionality of your
Instinct-based system. Some settings discussed are known to improve performance
for most applications running on a MI300X system. See
:doc:`/how-to/tuning-guides/mi300x/workload` for how to improve performance for
specific applications or workloads.
.. _mi300x-bios-settings:
System BIOS settings
--------------------
AMD EPYC 9004-based systems
^^^^^^^^^^^^^^^^^^^^^^^^^^^
For maximum MI300X GPU performance on systems with AMD EPYC™ 9004-series
processors and AMI System BIOS, the following configuration
of system BIOS settings has been validated. These settings must be used for the
qualification process and should be set as default values in the system BIOS.
Analogous settings for other non-AMI System BIOS providers could be set
similarly. For systems with Intel processors, some settings may not apply or be
available as listed in the following table.
Each row in the table details a setting but the specific location within the
BIOS setup menus may be different, or the option may not be present.
.. list-table::
:header-rows: 1
* - BIOS setting location
- Parameter
- Value
- Comments
* - Advanced / PCI subsystem settings
- Above 4G decoding
- Enabled
- GPU large BAR support.
* - Advanced / PCI subsystem settings
- SR-IOV support
- Enabled
- Enable single root IO virtualization.
* - AMD CBS / GPU common options
- Global C-state control
- Auto
- Global C-states -- do not disable this menu item).
* - AMD CBS / GPU common options
- CCD/Core/Thread enablement
- Accept
- May be necessary to enable the SMT control menu.
* - AMD CBS / GPU common options / performance
- SMT control
- Disable
- Set to Auto if the primary application is not compute-bound.
* - AMD CBS / DF common options / memory addressing
- NUMA nodes per socket
- Auto
- Auto = NPS1. At this time, the other options for NUMA nodes per socket
should not be used.
* - AMD CBS / DF common options / memory addressing
- Memory interleaving
- Auto
- Depends on NUMA nodes (NPS) setting.
* - AMD CBS / DF common options / link
- 4-link xGMI max speed
- 32 Gbps
- Auto results in the speed being set to the lower of the max speed the
motherboard is designed to support and the max speed of the CPU in use.
* - AMD CBS / NBIO common options
- IOMMU
- Enabled
-
* - AMD CBS / NBIO common options
- PCIe ten bit tag support
- Auto
-
* - AMD CBS / NBIO common options / SMU common options
- Determinism control
- Manual
-
* - AMD CBS / NBIO common options / SMU common options
- Determinism slider
- Power
-
* - AMD CBS / NBIO common options / SMU common options
- cTDP control
- Manual
- Set cTDP to the maximum supported by the installed CPU.
* - AMD CBS / NBIO common options / SMU common options
- cTDP
- 400
- Value in watts.
* - AMD CBS / NBIO common options / SMU common options
- Package power limit control
- Manual
- Set package power limit to the maximum supported by the installed CPU.
* - AMD CBS / NBIO common options / SMU common options
- Package power limit
- 400
- Value in watts.
* - AMD CBS / NBIO common options / SMU common options
- xGMI link width control
- Manual
- Set package power limit to the maximum supported by the installed CPU.
* - AMD CBS / NBIO common options / SMU common options
- xGMI force width control
- Force
-
* - AMD CBS / NBIO common options / SMU common options
- xGMI force link width
- 2
- * 0: Force xGMI link width to x2
* 1: Force xGMI link width to x8
* 2: Force xGMI link width to x16
* - AMD CBS / NBIO common options / SMU common options
- xGMI max speed
- Auto
- Auto results in the speed being set to the lower of the max speed the
motherboard is designed to support and the max speed of the CPU in use.
* - AMD CBS / NBIO common options / SMU common options
- APBDIS
- 1
- Disable DF (data fabric) P-states
* - AMD CBS / NBIO common options / SMU common options
- DF C-states
- Auto
-
* - AMD CBS / NBIO common options / SMU common options
- Fixed SOC P-state
- P0
-
* - AMD CBS / security
- TSME
- Disabled
- Memory encryption
.. _mi300x-grub-settings:
GRUB settings
-------------
In any modern Linux distribution, the ``/etc/default/grub`` file is used to
configure GRUB. In this file, the string assigned to ``GRUB_CMDLINE_LINUX`` is
the command line parameters that Linux uses during boot.
Appending strings via Linux command line
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is recommended to append the following strings in ``GRUB_CMDLINE_LINUX``.
``pci=realloc=off``
With this setting Linux is able to unambiguously detect all GPUs of the
MI300X-based system because this setting disables the automatic reallocation
of PCI resources. It's used when Single Root I/O Virtualization (SR-IOV) Base
Address Registers (BARs) have not been allocated by the BIOS. This can help
avoid potential issues with certain hardware configurations.
``iommu=pt``
The ``iommu=pt`` setting enables IOMMU pass-through mode. When in pass-through
mode, the adapter does not need to use DMA translation to the memory, which can
improve performance.
IOMMU is a system specific IO mapping mechanism and can be used for DMA mapping
and isolation. This can be beneficial for virtualization and device assignment
to virtual machines. It is recommended to enable IOMMU support.
For a system that has AMD host CPUs add this to ``GRUB_CMDLINE_LINUX``:
.. code-block:: text
amd_iommu=on iommu=pt
Otherwise, if the system has Intel host CPUs add this instead to
``GRUB_CMDLINE_LINUX``:
.. code-block:: text
intel_iommu=on iommu=pt
Update GRUB
-----------
Update GRUB to use the modified configuration:
.. code-block:: shell
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
On some Debian systems, the ``grub2-mkconfig`` command may not be available. Instead,
check for the presence of ``grub-mkconfig``. Additionally, verify that you have the
correct version by using the following command:
.. code-block:: shell
grub-mkconfig -version
.. _mi300x-os-settings:
Operating system settings
-------------------------
CPU core states (C-states)
^^^^^^^^^^^^^^^^^^^^^^^^^^
There are several core states (C-states) that an AMD EPYC CPU can idle within:
* **C0**: active. This is the active state while running an application.
* **C1**: idle. This state consumes less power compared to C0, but can quickly
return to the active state (C0) with minimal latency.
* **C2**: idle and power-gated. This is a deeper sleep state and will have greater
latency when moving back to the active (C0) state as compared to when the CPU
is coming out of C1.
Disabling C2 is important for running with a high performance, low-latency
network. To disable the C2 state, install the ``cpupower`` tool using your Linux
distribution's package manager. ``cpupower`` is not a base package in most Linux
distributions. The specific package to be installed varies per Linux
distribution.
.. tab-set::
.. tab-item:: Ubuntu
:sync: ubuntu
.. code-block:: shell
sudo apt install linux-tools-common
.. tab-item:: RHEL
:sync: rhel
.. code-block:: shell
sudo yum install cpupowerutils
.. tab-item:: SLES
:sync: sles
.. code-block:: shell
sudo zypper install cpupower
Now, to disable power-gating on all cores run the following on Linux
systems, run the following command.
.. code-block:: shell
cpupower idle-set -d 2
`/proc` and `/sys` file system settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. _mi300x-disable-numa:
Disable NUMA auto-balancing
'''''''''''''''''''''''''''
The NUMA balancing feature allows the OS to scan memory and attempt to migrate
to a DIMM that is logically closer to the cores accessing it. This causes an
overhead because the OS is second-guessing your NUMA allocations but may be
useful if the NUMA locality access is very poor. Applications can therefore, in
general, benefit from disabling NUMA balancing; however, there are workloads where
doing so is detrimental to performance. Test this setting
by toggling the ``numa_balancing`` value and running the application; compare
the performance of one run with this set to ``0`` and another run with this to
``1``.
Run the command ``cat /proc/sys/kernel/numa_balancing`` to check the current
NUMA (Non-Uniform Memory Access) settings. Output ``0`` indicates this
setting is disabled. If no output or output is ``1``, run the command
``sudo sh -c \\'echo 0 > /proc/sys/kernel/numa_balancing`` to disable it.
For these settings, the ``env_check.sh`` script automates setting, resetting,
and checking your environments. Find the script at
`<https://github.com/ROCm/triton/blob/rocm_env/scripts/amd/env_check.sh>`__.
Run the script as follows to set or reset the settings:
``./env_check.sh [set/reset/check]``
.. tip::
Use ``./env_check.sh -h`` for help info.
Automate disabling NUMA auto-balance using Cron
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :ref:`mi300x-disable-numa` section describes the command to disable NUMA
auto-balance. To automate the command with Cron, edit the ``crontab``
configuration file for the root user:
.. code-block:: shell
sudo crontab -e
#. Add the following Cron entry to run the script at a specific interval:
.. code-block:: shell
@reboot sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
#. Save the file and exit the text editor.
#. Optionally, restart the system to apply changes by issuing ``sudo reboot``.
#. Verify your new configuration.
.. code-block::
cat /proc/sys/kernel/numa_balancing
The ``/proc/sys/kernel/numa_balancing`` file controls NUMA balancing in the
Linux kernel. If the value in this file is set to ``0``, the NUMA balancing
is disabled. If the value is set to ``1``, NUMA balancing is enabled.
.. note::
Disabling NUMA balancing should be done cautiously and for
specific reasons, such as performance optimization or addressing
particular issues. Always test the impact of disabling NUMA balancing in
a controlled environment before applying changes to a production system.
.. _mi300x-env-vars:
Environment variables
^^^^^^^^^^^^^^^^^^^^^
HIP provides an environment variable export ``HIP_FORCE_DEV_KERNARG=1`` that
can put arguments of HIP kernels directly to device memory to reduce the
latency of accessing those kernel arguments. It can improve performance by 2 to
3 µs for some kernels.
It is recommended to set the following environment variable:
.. code-block:: shell
export HIP_FORCE_DEV_KERNARG=1
.. note::
This is the default option as of ROCm 6.2.
IOMMU configuration -- systems with 256 CPU threads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For systems that have 256 logical CPU cores or more, setting the input-output
memory management unit (IOMMU) configuration to ``disabled`` can limit the
number of available logical cores to 255. The reason is that the Linux kernel
disables X2APIC in this case and falls back to Advanced Programmable Interrupt
Controller (APIC), which can only enumerate a maximum of 255 (logical) cores.
If SMT is enabled by setting ``CCD/Core/Thread Enablement > SMT Control`` to
``enable``, you can apply the following steps to the system to enable all
(logical) cores of the system:
#. In the server BIOS, set IOMMU to ``Enabled``.
#. When configuring the GRUB boot loader, add the following arguments for the Linux kernel: ``amd_iommu=on iommu=pt``.
#. Update GRUB.
#. Reboot the system.
#. Verify IOMMU passthrough mode by inspecting the kernel log via ``dmesg``:
.. code-block::
dmesg | grep iommu
.. code-block:: shell
[...]
[ 0.000000] Kernel command line: [...] amd_iommu=on iommu=pt
[...]
Once the system is properly configured, ROCm software can be
:doc:`installed <rocm-install-on-linux:index>`.
.. _mi300x-system-management:
System management
=================
To optimize system performance, it's essential to first understand the existing
system configuration parameters and settings. ROCm offers several CLI tools that
can provide system-level information, offering valuable insights for
optimizing user applications.
For a complete guide on how to install, manage, or uninstall ROCm on Linux, refer to
:doc:`rocm-install-on-linux:tutorial/quick-start`. For verifying that the
installation was successful, refer to the
:doc:`rocm-install-on-linux:how-to/native-install/post-install`.
Should verification fail, consult :doc:`/how-to/system-debugging`.
Hardware verification with ROCm
-------------------------------
The ROCm platform provides tools to query the system structure. These include
:ref:`ROCm SMI <mi300x-rocm-smi>` and :ref:`ROCm Bandwidth Test <mi300x-bandwidth-test>`.
.. _mi300x-rocm-smi:
ROCm SMI
^^^^^^^^
To query your GPU hardware, use the ``rocm-smi`` command. ROCm SMI lists
GPUs available to your system -- with their device ID and their respective
firmware (or VBIOS) versions.
The following screenshot shows that all 8 GPUs of MI300X are recognized by ROCm.
Performance of an application could be otherwise suboptimal if, for example, out
of the 8 GPUs only 5 of them are recognized.
.. image:: ../../data/how-to/tuning-guides/rocm-smi-showhw.png
:align: center
:alt: ``rocm-smi --showhw`` output
To see the system structure, the localization of the GPUs in the system, and the
fabric connections between the system components, use the command
``rocm-smi --showtopo``.
.. image:: ../../data/how-to/tuning-guides/rocm-smi-showtopo.png
:align: center
:alt: ``rocm-smi --showtopo`` output
The first block of the output shows the distance between the GPUs similar to
what the ``numactl`` command outputs for the NUMA domains of a system. The
weight is a qualitative measure for the “distance” data must travel to reach one
GPU from another one. While the values do not carry a special, or "physical"
meaning, the higher the value the more hops are needed to reach the destination
from the source GPU. This information has performance implication for a
GPU-based application that moves data among GPUs. You can choose a minimum
distance among GPUs to be used to make the application more performant.
The second block has a matrix named *Hops between two GPUs*, where:
* ``1`` means the two GPUs are directly connected with xGMI,
* ``2`` means both GPUs are linked to the same CPU socket and GPU communications
will go through the CPU, and
* ``3`` means both GPUs are linked to different CPU sockets so communications will
go through both CPU sockets. This number is one for all GPUs in this case
since they are all connected to each other through the Infinity Fabric links.
The third block outputs the link types between the GPUs. This can either be
``XGMI`` for AMD Infinity Fabric links or ``PCIE`` for PCIe Gen5 links.
The fourth block reveals the localization of a GPU with respect to the NUMA
organization of the shared memory of the AMD EPYC processors.
To query the compute capabilities of the GPU devices, use rocminfo command. It
lists specific details about the GPU devices, including but not limited to the
number of compute units, width of the SIMD pipelines, memory information, and
instruction set architecture (ISA). The following is the truncated output of the
command:
.. image:: ../../data/how-to/tuning-guides/rocminfo.png
:align: center
:alt: rocminfo.txt example
For a complete list of architecture (such as CDNA3) and LLVM target names
(such gfx942 for MI300X), refer to the
:doc:`Supported GPUs section of the System requirements for Linux page <rocm-install-on-linux:reference/system-requirements>`.
Deterministic clock
'''''''''''''''''''
Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock
speed up to 1900 MHz instead of the default 2100 MHz. This can reduce
the chance of a PCC event lowering the attainable GPU clocks. This
setting will not be required for new IFWI releases with the production
PRC feature. Restore this setting to its default value with the
``rocm-smi -r`` command.
.. _mi300x-bandwidth-test:
ROCm Bandwidth Test
^^^^^^^^^^^^^^^^^^^
The section Hardware verification with ROCm showed how the command
``rocm-smi --showtopo`` can be used to view the system structure and how the
GPUs are connected. For more details on the link bandwidth,
``rocm-bandwidth-test`` can run benchmarks to show the effective link bandwidth
between the components of the system.
You can install ROCm Bandwidth Test, which can test inter-device bandwidth,
using the following package manager commands:
.. tab-set::
.. tab-item:: Ubuntu
:sync: ubuntu
.. code-block:: shell
sudo apt install rocm-bandwidth-test
.. tab-item:: RHEL
:sync: rhel
.. code-block:: shell
sudo yum install rocm-bandwidth-test
.. tab-item:: SLES
:sync: sles
.. code-block:: shell
sudo zypper install rocm-bandwidth-test
Alternatively, you can download the source code from
`<https://github.com/ROCm/rocm_bandwidth_test>`__ and build from source.
The output will list the available compute devices (CPUs and GPUs), including
their device ID and PCIe ID. The following screenshot is an example of the
beginning part of the output of running ``rocm-bandwidth-test``. It shows the
devices present in the system.
.. image:: ../../data/how-to/tuning-guides/rocm-bandwidth-test.png
:align: center
:alt: rocm-bandwidth-test sample output
The output will also show a matrix that contains a ``1`` if a device can
communicate to another device (CPU and GPU) of the system and it will show the
NUMA distance -- similar to ``rocm-smi``.
Inter-device distance:
.. figure:: ../../data/how-to/tuning-guides/rbt-inter-device-access.png
:align: center
:alt: rocm-bandwidth-test inter-device distance
Inter-device distance
Inter-device NUMA distance:
.. figure:: ../../data/how-to/tuning-guides/rbt-inter-device-numa-distance.png
:align: center
:alt: rocm-bandwidth-test inter-device NUMA distance
Inter-device NUMA distance
The output also contains the measured bandwidth for unidirectional and
bidirectional transfers between the devices (CPU and GPU):
Unidirectional bandwidth:
.. figure:: ../../data/how-to/tuning-guides/rbt-unidirectional-bandwidth.png
:align: center
:alt: rocm-bandwidth-test unidirectional bandwidth
Unidirectional bandwidth
Bidirectional bandwidth
.. figure:: ../../data/how-to/tuning-guides/rbt-bidirectional-bandwidth.png
:align: center
:alt: rocm-bandwidth-test bidirectional bandwidth
Bidirectional bandwidth
Acronyms
========
AMI
American Megatrends International
APBDIS
Algorithmic Performance Boost Disable
ATS
Address Translation Services
BAR
Base Address Register
BIOS
Basic Input/Output System
CBS
Common BIOS Settings
CLI
Command Line Interface
CPU
Central Processing Unit
cTDP
Configurable Thermal Design Power
DDR5
Double Data Rate 5 DRAM
DF
Data Fabric
DIMM
Dual In-line Memory Module
DMA
Direct Memory Access
DPM
Dynamic Power Management
GPU
Graphics Processing Unit
GRUB
Grand Unified Bootloader
HPC
High Performance Computing
IOMMU
Input-Output Memory Management Unit
ISA
Instruction Set Architecture
LCLK
Link Clock Frequency
NBIO
North Bridge Input/Output
NUMA
Non-Uniform Memory Access
PCC
Power Consumption Control
PCI
Peripheral Component Interconnect
PCIe
PCI Express
POR
Power-On Reset
SIMD
Single Instruction, Multiple Data
SMT
Simultaneous Multi-threading
SMI
System Management Interface
SOC
System On Chip
SR-IOV
Single Root I/O Virtualization
TP
Tensor Parallelism
TSME
Transparent Secure Memory Encryption
X2APIC
Extended Advanced Programmable Interrupt Controller
xGMI
Inter-chip Global Memory Interconnect

View File

@@ -1,11 +1,11 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="RDNA2 workstation tuning guide">
<meta name="keywords" content="RDNA2, workstation tuning, BIOS settings, installation, AMD,
<meta name="keywords" content="RDNA2, workstation, BIOS settings, installation, AMD,
ROCm">
</head>
# RDNA2 workstation tuning guide
# AMD RDNA2 system optimization
## System settings

View File

@@ -1,108 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD hardware optimization for specific workloads">
<meta name="keywords" content="high-performance computing, HPC, Instinct accelerators,
Radeon, tuning, tuning guide, AMD, ROCm">
</head>
# System optimization
This guide outlines system setup and tuning suggestions for AMD hardware to optimize performance for specific types of
workloads or use-cases.
## High-performance computing
High-performance computing (HPC) workloads have unique requirements. The default
hardware and BIOS configurations for OEM platforms may not provide optimal
performance for HPC workloads. To enable optimal HPC settings on a per-platform
and per-workload level, this guide calls out:
* BIOS settings that can impact performance
* Hardware configuration best practices
* Supported versions of operating systems
* Workload-specific recommendations for optimal BIOS and operating system
settings
There is also a discussion on the AMD Instinct™ software development
environment, including information on how to install and run the DGEMM, STREAM,
HPCG, and HPL benchmarks. This guidance provides a good starting point but is
not exhaustively tested across all compilers.
Prerequisites to understanding this document and to performing tuning of HPC
applications include:
* Experience in configuring servers
* Administrative access to the server's Management Interface (BMC)
* Administrative access to the operating system
* Familiarity with the OEM server's BMC (strongly recommended)
* Familiarity with the OS specific tools for configuration, monitoring, and
troubleshooting (strongly recommended)
This document provides guidance on tuning systems with various AMD Instinct™
accelerators for HPC workloads. This document is not an all-inclusive guide, and
some items referred to may have similar, but different, names in various OEM
systems (for example, OEM-specific BIOS settings). This document also provides
suggestions on items that should be the initial focus of additional,
application-specific tuning.
This document is based on the AMD EPYC™ 7003-series processor family (former
codename "Milan").
While this guide is a good starting point, developers are encouraged to perform
their own performance testing for additional tuning.
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card}
**[AMD Instinct™ MI200](./tuning-guides/mi200)**
This chapter goes through how to configure your AMD Instinct™ MI200 accelerated
compute nodes to get the best performance out of them.
* [Instruction Set Architecture (ISA)](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
:::
:::{grid-item-card}
**[AMD Instinct™ MI100](./tuning-guides/mi100)**
This chapter briefly reviews hardware aspects of the AMD Instinct™ MI100
accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs.
* [ISA](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
* [White paper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
:::
:::::
## Workstation
Workstation workloads, much like high-performance computing, have a unique set of
requirements, a blend of both graphics and compute, certification, stability and
the list continues.
The document covers specific software requirements and processes needed to use
these GPUs for Single Root I/O Virtualization (SR-IOV) and machine learning
(ML).
The main purpose of this document is to help users utilize the RDNA 2 GPUs to
their full potential.
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card}
**[AMD Radeon™ PRO W6000 and V620](./tuning-guides/w6000-v620)**
This chapter describes the AMD GPUs with RDNA™ 2 architecture, namely AMD Radeon
PRO W6800 and AMD Radeon PRO V620
* [AMD RDNA2 ISA](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/system/files/documents/rdna2-explained-radeon-pro-W6000.pdf)
:::
:::::

View File

@@ -0,0 +1,13 @@
************************
AMD MI300X tuning guides
************************
The tuning guides in this section provide a comprehensive summary of the
necessary steps to properly configure your system for AMD Instinct™ MI300X
accelerators. They include detailed instructions on system settings and
application tuning suggestions to help you fully leverage the capabilities of
these accelerators, thereby achieving optimal performance.
* :doc:`/how-to/tuning-guides/mi300x/system`
* :doc:`/how-to/tuning-guides/mi300x/workload`

View File

@@ -0,0 +1,22 @@
***************************************
AMD Instinct MI300X system optimization
***************************************
The :doc:`/how-to/system-optimization/mi300x` guide discusses system settings that are
required to configure your system for AMD Instinct™ MI300X accelerators.
Some settings discussed are known to improve performance for most applications
running on an MI300X system.
Topics discussed therein include:
* :ref:`System BIOS settings <mi300x-bios-settings>`
* :ref:`GRUB settings <mi300x-grub-settings>`
* :ref:`Operating system settings <mi300x-os-settings>`
* :ref:`System management <mi300x-system-management>`
For a look into improving performance for specific applications or workloads,
see :doc:`/how-to/tuning-guides/mi300x/workload`.

File diff suppressed because it is too large Load Diff

View File

@@ -25,13 +25,11 @@ Our documentation is organized into the following categories:
:class-container: rocm-doc-grid
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-installation.jpg
:img-alt: Install documentation
:padding: 2
* Linux
* {doc}`Quick start guide<rocm-install-on-linux:tutorial/quick-start>`
* {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`
* {doc}`Package manager integration<rocm-install-on-linux:how-to/native-install/package-manager-integration>`
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
@@ -43,11 +41,9 @@ Our documentation is organized into the following categories:
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
:::
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-compatibility.jpg
:img-alt: Compatibility information
:padding: 2
@@ -60,12 +56,10 @@ Our documentation is organized into the following categories:
* {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
* [OpenMP](./about/compatibility/openmp.md)
* [Precision support](./compatibility/precision-support.rst)
* {doc}`ROCm on Radeon GPUs<radeon:index>`
:::
<!-- markdownlint-disable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-reference.jpg
:img-alt: Reference documentation
:padding: 2
@@ -79,35 +73,39 @@ Our documentation is organized into the following categories:
* [HIP runtime](#hip-runtime)
* [Tools](./reference/rocm-tools.md)
* [Development](#development-tools)
* [Performance analysis](#performance-analysis)
* [Performance analysis](#performance-tools)
* [System](#system-tools)
* [Hardware specifications](./reference/gpu-arch-specs.rst)
:::
<!-- markdownlint-enable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-howto.jpg
:img-alt: How-to documentation
:padding: 2
* [Using ROCm for AI](./how-to/rocm-for-ai/index.rst)
* [Using ROCm for HPC](./how-to/rocm-for-hpc/index.rst)
* [Fine-tuning LLMs and inference optimization](./how-to/llm-fine-tuning-optimization/index.rst)
* [System tuning for various architectures](./how-to/tuning-guides.md)
* [MI100](./how-to/tuning-guides/mi100.md)
* [MI200](./how-to/tuning-guides/mi200.md)
* [RDNA2](./how-to/tuning-guides/w6000-v620.md)
* [System optimization](./how-to/system-optimization/index.rst)
* [AMD Instinct MI300X](./how-to/system-optimization/mi300x.rst)
* [AMD Instinct MI200](./how-to/system-optimization/mi200.md)
* [AMD Instinct MI100](./how-to/system-optimization/mi100.md)
* [AMD Instinct RDNA2](./how-to/system-optimization/w6000-v620.md)
* [AMD Instinct MI300X tuning guides](./how-to/tuning-guides/mi300x/index.rst)
* [System tuning](./how-to/tuning-guides/mi300x/system.rst)
* [Workload tuning](./how-to/tuning-guides/mi300x/workload.rst)
* [System debugging](./how-to/system-debugging.md)
* [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
* [Using compiler features](./conceptual/compiler-topics.md)
* [Using AddressSanitizer](./conceptual/using-gpu-sanitizer.md)
* [Compiler disambiguation](./conceptual/compiler-disambiguation.md)
* [OpenMP support in ROCm](./about/compatibility/openmp.md)
* [System level debugging](./how-to/system-debugging.md)
* [Setting the number of CUs](./how-to/setting-cus)
* [GitHub examples](https://github.com/amd/rocm-examples)
:::
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-conceptual.jpg
:img-alt: Conceptual documentation
:padding: 2
@@ -117,7 +115,6 @@ Our documentation is organized into the following categories:
* [MI250](./conceptual/gpu-arch/mi250.md)
* [MI300](./conceptual/gpu-arch/mi300.md)
* [GPU memory](./conceptual/gpu-memory.md)
* [Setting the number of CUs](./conceptual/setting-cus)
* [File structure (Linux FHS)](./conceptual/file-reorg.md)
* [GPU isolation techniques](./conceptual/gpu-isolation.md)
* [Using CMake](./conceptual/cmake-packages.rst)

View File

@@ -233,6 +233,22 @@ The following tables provide an overview of the hardware specifications for AMD
- L0 Instruction Cache (KiB)
- VGPR File (KiB)
- SGPR File (KiB)
*
- Radeon PRO W7900 Dual Slot
- RDNA3
- gfx1100
- 48
- 96
- 32
- 128
- 96
- 6
- 256
- 32
- 16
- 32
- 384
- 20
*
- Radeon PRO W7900
- RDNA3

View File

@@ -22,8 +22,8 @@
* {doc}`HIPIFY <hipify:index>`
* {doc}`ROCdbgapi <rocdbgapi:index>`
* [ROCmCC](./rocmcc.md)
* [ROCm Debug Agent](https://github.com/ROCm/rocr_debug_agent)
* {doc}`ROCm debugger (ROCgdb) <rocgdb:index>`
* {doc}`ROCm Debugger (ROCgdb) <rocgdb:index>`
* {doc}`ROCr Debug Agent <rocr_debug_agent:index>`
:::
(performance-tools)=
@@ -53,7 +53,6 @@
* {doc}`ROCm Data Center Tool <rdc:index>`
* {doc}`ROCm SMI <rocm_smi_lib:index>`
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
* {doc}`TransferBench <transferbench:index>`
:::
::::

Some files were not shown because too many files have changed in this diff Show More