Compare commits

..

386 Commits

Author SHA1 Message Date
Peter Park
e229b10a06 linux post-install: clean up exports 2025-12-16 16:37:20 -05:00
peterjunpark
b2c4ea3c76 update version list link (#5761) 2025-12-11 17:04:01 -05:00
peterjunpark
c92c9d2b57 [docs/7.9.0] Update banner msg (#5704)
update
2025-11-26 14:36:10 -05:00
peterjunpark
1a2e4f1cc0 [docs/7.9.0] Fix rocm-cmake github link due to non-existent tag (#5684) 2025-11-20 13:57:27 -05:00
peterjunpark
e76255db98 [docs/7.9.0] Add xDiT diffusion inference doc (#5676) 2025-11-18 16:56:00 -05:00
Alex Xu
cdbcad6930 add preview announcement 2025-11-13 15:57:09 -05:00
Alex Xu
dc19d266b4 update rocm-docs-core to show preview banner 2025-11-12 13:53:45 -05:00
Alex Xu
8584fa3b25 update rocm-docs-core to 1.29.0
(cherry picked from commit 39de859bd1)
2025-11-10 14:12:28 -05:00
peterjunpark
163d46ca5c [docs/7.9.0] Fix xref in Ubuntu prerequsites and RST heading overline (#5569) 2025-10-24 16:11:28 -04:00
peterjunpark
56f93e72de [docs/7.9.0] Use "generic" rocm-docs-core theme (#5568)
* use "generic" rocm-docs-core theme to tweak header

* restore "nav_secondary_items"
2025-10-24 11:04:12 -04:00
peterjunpark
f004891485 [docs/7.9.0] Note rocprofiler-sdk is Instinct only. Reorg some files to match docs/7.0.x. (#5563)
* move versions.md and compat-matrix to match prod; note rocprofiler-sdk is instinct only

* update href in versions.md
2025-10-23 12:32:20 -04:00
peterjunpark
3c61d4fb05 [docs/7.9.0] Add build from source overview page / Point to therocm-7.9.0 in components list (#5546)
* Update links to components to point the `therock-7.9.0` ref

* Add build from source page

* lint: fix caps and update .wordlist.txt

* add link to "development manuals" list

* add links to TheRock's development guide and fix step 4

* wording and fmt

* fix spacing

* fix fmt

* Fix documentation linting errors

* fix spacing
2025-10-21 12:41:56 -04:00
peterjunpark
39783dcee0 [docs/7.9.0] Fix GPU marketing names in 7.9.0 release.md and compatibility matrix / Add SD3.5 ComfyUI example (#5552)
* Add SD3.5 example to comfyui doc

* Fix Ryzen AI Max (PRO) SKU names

* Add names in multi line format
2025-10-21 11:02:48 -04:00
peterjunpark
c965b7e98e Add custom version history (#5551) 2025-10-20 18:11:38 -04:00
peterjunpark
70acbcb27a Add "preview" to headings (#5545) 2025-10-20 12:37:22 -04:00
Peter Park
b0fc5959da update PLDM bundle version for MI355 2025-10-20 12:26:28 -04:00
Peter Park
70fc9d8fb8 remove extra files 2025-10-20 12:26:15 -04:00
Peter Park
a32210fa7e Add ROCm 7.9.0 documentation
Add release notes

Add install instructions

Add PyTorch + ComfyUI instructions

Add custom selector directives

Add JS and CSS for selector

Add custom icon directive and utils

Clean up conf.py
2025-10-20 12:17:50 -04:00
Pratik Basyal
78258e0f85 702 compatibility Footnote updated (#5502)
* Footnote updated

* Minor update

* Minor update

* Break added

* Line break added

* Line break

* Footnote updated

* Minor correction
2025-10-10 21:23:07 -04:00
amd-hsong
c79d9f74ef Merge pull request #5490 Re-enable device_merge_inplace unit test for rocPRIM 2025-10-10 15:03:23 -06:00
amd-hsivasun
fb1b78c6f0 [Ex CI] Added Component and Module Dependencies (#5489)
* [Ex CI] Added Component and Module Dependencies

* Add registerROCmPackages flag
2025-10-10 16:01:11 -04:00
peterjunpark
3a70d75f5e Fix documented AMD SMI version (ROCm 7.0.2) (#5496) 2025-10-10 15:09:20 -04:00
alexxu-amd
61e1f088a1 Merge pull request #5492 from ROCm/sync-dev-from-internal
Sync dev from internal for 7.0.2 GA
2025-10-10 11:17:32 -04:00
Pratik Basyal
1f6e5c5e04 Update compatibility-matrix.rst 2025-10-10 11:10:48 -04:00
Pratik Basyal
e8a0769842 Update RELEASE.md 2025-10-10 11:07:51 -04:00
Alex Xu
6f9579d052 Merge remote-tracking branch 'internal/develop' into sync-dev-from-internal 2025-10-10 11:02:33 -04:00
Pratik Basyal
245d53a021 Merge pull request #579 from prbasyal-amd/post-rc3-702-update
GPU resiliency highlight updated 702
2025-10-10 11:00:59 -04:00
Alex Xu
35dbbb22bc fix linting 2025-10-10 10:29:13 -04:00
alexxu-amd
03dc8cee00 Merge pull request #584 from ROCm/sync-dev-from-external
Sync dev from external
2025-10-10 10:14:56 -04:00
Alex Xu
323e5fd27a Merge remote-tracking branch 'external/develop' into sync-dev-from-external 2025-10-10 10:13:08 -04:00
alexxu-amd
b11fd7b492 Update versions.md (#583) 2025-10-10 09:31:24 -04:00
srayasam-amd
5e2efa05a6 7.0.2 GA update (#5491)
* 7.0.2 GA update

* Create rocm-7.0.2.xml
2025-10-10 18:47:48 +05:30
Hao Song
29a90f0271 [rocPRIM] Re-enable device_merge_inplace unit test for rocPRIM 2025-10-09 21:48:11 +00:00
randyh62
c06242bb89 Update RELEASE.md (#581)
* Update RELEASE.md

Remove support for rocBlas and hipBlasLt

* Update CHANGELOG.md

Removed from the Changelog as well.
2025-10-09 13:15:08 -07:00
peterjunpark
68e8453ca5 Update vLLM doc for 10/6 release and bump rocm-docs-core to 1.26.0 (#5481)
* archive previous doc version

* update model/docker data and doc templates

* Update "Reproducing the Docker image"

* fix: truncated commit hash doesn't work for some reason

* bump rocm-docs-core to 1.26.0

* fix numbering

fix

* update docker tag

* update .wordlist.txt
2025-10-08 16:23:40 -04:00
Pratik Basyal
503b8bcc86 Framework and changelog updated (#5483)
* Framework and chaneglog updated

* Wordlist updated
2025-10-08 15:05:11 -04:00
amd-hsivasun
e3d97d339a [Ex CI] Added rocJPEG and rocprofiler-sdk 2025-10-08 14:47:44 -04:00
alexxu-amd
978c58d196 Merge pull request #577 from ROCm/sync-develop-from-external
Sync develop from external
2025-10-08 14:25:03 -04:00
alexxu-amd
a366048b64 Merge branch 'develop' into sync-develop-from-external 2025-10-08 14:12:14 -04:00
Pratik Basyal
4c3e33c291 Compatibility matrix and changelog synced for ROCm 7.0.2 (#576)
* Compatibility matrix and changelog synced

* Indentation updated

* OS updated
2025-10-08 14:11:15 -04:00
Alex Xu
89758e67d8 Merge remote-tracking branch 'external/develop' into sync-develop-from-external 2025-10-08 14:03:34 -04:00
Pratik Basyal
5d0f201b4d 7.0.2 review update (#575)
* 7.0.2 review update

* Tensorflow footnote updated

* Wordlist added
2025-10-08 12:35:14 -04:00
Pratik Basyal
e3677d89a6 PLDM bundle info updated for 7.0.2 (#574)
* PLDM bundle info updated

* Driver dependency added to GPU resiliency

* Known issue for Migrpahx added

* Footnote added

* Known issue for OpenCV updated

* Leo's feedback incorporated

* Radeon 9060 updated

* Known issues updated
2025-10-08 11:00:42 -04:00
amd-hsivasun
f20edab8fc [Ex CI] Update CMake Flags for hipTensor 2025-10-07 15:21:39 -04:00
Pratik Basyal
6f84d50011 ROCm 7.0.2 Post RC3 update (#573)
* Space minimized

* OS support updated

* Minor change
2025-10-06 14:08:01 -04:00
Pratik Basyal
57dd082f28 Post RC2 7.0.2 review feedback updated (#571)
* Known issue updated

* Space optimized

* Changelog updated

* Apply suggestions from code review

Leo's review feedback incorporated

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Highlight changes

* Highlight and OS support updated

* GPU resiliency highlight updated

* Highlights updated

* ROCm-EP deprecation added

* Apply suggestions from code review

leo's feedback incorporated

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* PLDM update

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-10-06 12:04:09 -04:00
peterjunpark
eeea0d2180 Fix heading levels in pages using embedded templates (#5468) 2025-10-03 13:33:14 -04:00
anisha-amd
93c6d17922 Docs: frameworks 25.09 - compatibility - FlashInfer and llama.cpp (#5462) 2025-10-02 13:51:36 -04:00
amd-hsivasun
f91c2b9b4a Update dependencies-rocm.yml 2025-10-01 15:31:35 -04:00
amd-hsivasun
5e6b66ca39 Remove tasks to locate test dir 2025-10-01 15:30:37 -04:00
amd-hsivasun
6b8b359d03 Updated test dir to s/build/tests 2025-10-01 15:30:37 -04:00
amd-hsivasun
38e659e5f0 Update testDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
0894547f5a Update setupenv 2025-10-01 15:30:37 -04:00
amd-hsivasun
aca31170c4 Update setupenv 2025-10-01 15:30:37 -04:00
amd-hsivasun
d21ec9eea5 Updated testDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
189c269350 Added Debug 2025-10-01 15:30:37 -04:00
amd-hsivasun
774cb7a1b3 Changed testDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
024cb4db76 Added testDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
945fb286f7 Find tests Task 2025-10-01 15:30:37 -04:00
amd-hsivasun
ee93101541 Change list files 2025-10-01 15:30:37 -04:00
amd-hsivasun
e31841312b Update testDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
41b5298659 Added a list for all rp-systems files 2025-10-01 15:30:37 -04:00
amd-hsivasun
58790154b2 Add a script to look for setup-env.sh 2025-10-01 15:30:37 -04:00
amd-hsivasun
6f7f73ac0b Update workingDirectories 2025-10-01 15:30:37 -04:00
amd-hsivasun
b2e3bc8565 [Ex CI] Updated rp-systems CMakeBuildDir 2025-10-01 15:30:37 -04:00
amd-hsivasun
52979e2fdb [Ex CI] Updated testDir for rp-systems tests 2025-10-01 15:30:37 -04:00
peterjunpark
0ea5216ace docs: update article_info in conf.py (#5454) 2025-10-01 13:17:50 -04:00
peterjunpark
2e1b4dd5ee Add multi-node setup instructions for training perf Dockers (#5449)
---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-30 14:53:38 -04:00
Pratik Basyal
5c7b993c0c 7.0.2 release changes (#568)
* Initial changes for 7.0.2

* Heading level updated

* Release notes changes

* rocsolver added

* Known issues updated

* Highlights updated

* RN changes

* Release highlights for AI applications updated

* AI developer contents added

* leo's review feedback added

* Compatibility matrix updated

* GPU driver support
2025-09-30 14:02:04 -04:00
amd-hsivasun
2d79b3c4bd [Ex CI] Added rocm-cmake dependency 2025-09-30 14:00:16 -04:00
Peter Park
fd59b5fbac fix links in docs (#5446) 2025-09-29 15:27:32 -04:00
amd-hsivasun
0a643f4686 [Ex CI] Enable aqlprofile 2025-09-26 14:42:15 -04:00
amd-hsivasun
d9e5744f7a Update testExecutable 2025-09-26 14:01:02 -04:00
amd-hsivasun
ccb849ec02 Added python3-pip to aptModules 2025-09-26 14:01:02 -04:00
amd-hsivasun
42d4867964 Removed more aptPackages 2025-09-26 14:01:02 -04:00
amd-hsivasun
375359a5dd Added ninja to aptPackages 2025-09-26 14:01:02 -04:00
amd-hsivasun
e92745f1ff Removed apt and pip modules 2025-09-26 14:01:02 -04:00
amd-hsivasun
0fa72358d3 Remove registerROCm packages flag 2025-09-26 14:01:02 -04:00
amd-hsivasun
6fec268a4e Removed package manager 2025-09-26 14:01:02 -04:00
amd-hsivasun
ff14cd1ff5 Added pyyaml 2025-09-26 14:01:02 -04:00
amd-hsivasun
8f65688653 Added registerROCmPackages 2025-09-26 14:01:02 -04:00
amd-hsivasun
33d1493adb Removed dependencies 2025-09-26 14:01:02 -04:00
amd-hsivasun
4b6c7776a2 Updated parameters 2025-09-26 14:01:02 -04:00
amd-hsivasun
af811daa1b Added GPUTarget 2025-09-26 14:01:02 -04:00
amd-hsivasun
d6c045e482 Update test parameters 2025-09-26 14:01:02 -04:00
amd-hsivasun
78b24cad39 Update test pool 2025-09-26 14:01:02 -04:00
amd-hsivasun
753a94c0bb Add test step to buildjob 2025-09-26 14:01:02 -04:00
amd-hsivasun
6ecad57c62 Revert pool changes 2025-09-26 14:01:02 -04:00
amd-hsivasun
977554809a Changed cmake prefix path 2025-09-26 14:01:02 -04:00
amd-hsivasun
7b00f4493b Removed module and prefix path 2025-09-26 14:01:02 -04:00
amd-hsivasun
95c439a272 Removed Compiler Path 2025-09-26 14:01:02 -04:00
amd-hsivasun
94e04fbdc0 Updated testpool 2025-09-26 14:01:02 -04:00
amd-hsivasun
7ab59de8af Update testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
175c817563 Change testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
25516d312e Updated testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
30c345629a Changed testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
210dc94bbb Removed testExecutable 2025-09-26 14:01:02 -04:00
amd-hsivasun
a54023ccb8 Changed testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
17e3362dc7 Add Checkout to testjob 2025-09-26 14:01:02 -04:00
amd-hsivasun
0f9c0d884d Updated testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
c890de4b16 Added Path to Gtest 2025-09-26 14:01:02 -04:00
amd-hsivasun
4ea77ab515 Added Tests 2025-09-26 14:01:02 -04:00
amd-hsivasun
c0512612f4 Updated testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
1c81ac3747 Updated testdir path 2025-09-26 14:01:02 -04:00
amd-hsivasun
4bafa42e52 Updated test parameters 2025-09-26 14:01:02 -04:00
amd-hsivasun
493801e670 Updated testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
1a5152b7b3 Removed testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
874c881012 Fixed testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
bdcaeea74c Updated testdir 2025-09-26 14:01:02 -04:00
amd-hsivasun
b02669acf7 Fixed Dependencies 2025-09-26 14:01:02 -04:00
amd-hsivasun
844f10b2b1 Updated denendecies-other variables 2025-09-26 14:01:02 -04:00
amd-hsivasun
d6c14920b4 External CI: Build pipeline for aqlprofile 2025-09-26 14:01:02 -04:00
amd-hsivasun
4affe10a7c [Ex CI] Update pipeline Id for rdc to monorepo 2025-09-26 12:38:57 -04:00
amd-hsivasun
81341ef435 Add New Line 2025-09-26 11:41:21 -04:00
amd-hsivasun
abacd328f9 [Ex CI] Added rocRand to rocmDependencies 2025-09-26 11:41:21 -04:00
amd-hsivasun
80b2fb6e26 [Ex CI] Add hipRAND to rocmDependencies 2025-09-26 11:41:21 -04:00
amd-hsivasun
b53e8decfc [Ex CI] Enable rdc monorepo 2025-09-26 11:41:21 -04:00
amd-hsivasun
5fcc2eafde [Ex CI] Update pipeline Id for rocprofiler-sdk to monorepo 2025-09-25 16:49:07 -04:00
amd-hsivasun
2eb0d77bc6 Updated testDir 2025-09-25 13:20:37 -04:00
amd-hsivasun
d84b41908f Changed Testdir 2025-09-25 13:20:37 -04:00
amd-hsivasun
986f8284d1 [Ex CI] Update testDir for rocprofiler-sdk 2025-09-25 13:20:37 -04:00
Pratik Basyal
d92d9268dc Use of Radeon and Ryzen reference updated [Develop] (#5432)
* Use of Radeon and Ryzen reference updated

* Pytorch link update
2025-09-24 19:07:41 -05:00
Ibrahim Wani
1629d3f0ea Add origami yaml based tests to azure pipelines (#5431)
* Add origami yaml tests

* Dependency fix in origami.yml

* Fix almalinux dependency; get publish test results step working

* Fix almalinux dependency issue
2025-09-24 14:49:51 -06:00
Pratik Basyal
6cf6b34b2e TOC for ROCm on Radeon and Ryzen updated (#5429) 2025-09-24 13:58:26 -05:00
Pratik Basyal
c35a0a121a ROR link and text updated (#5426) 2025-09-24 13:28:13 -05:00
amd-hsivasun
412e383654 [Ex CI] Update pipeline Id for rocprofiler-sdk 2025-09-23 15:56:49 -04:00
Pratik Basyal
39f6fc187d rocm-core version updated (#5418) 2025-09-23 15:49:33 -04:00
amd-hsivasun
05b480fb28 Update rocm-examples.yml 2025-09-23 12:10:11 -04:00
amd-hsivasun
4fa44d90db Updated dependencies-cmake-custom.yml default ver 2025-09-23 12:10:11 -04:00
amd-hsivasun
c9ef13d823 Added Custom Cmake to testjobs 2025-09-23 12:10:11 -04:00
amd-hsivasun
f02172050b Added rocWMMA dependency 2025-09-23 12:10:11 -04:00
amd-hsivasun
154dbe297a Updated File to take custom cmake version 2025-09-23 12:10:11 -04:00
amd-hsivasun
993a0a4fd4 [Ex CI] Update cmake 2025-09-23 12:10:11 -04:00
amd-hsivasun
c03662f410 [Ex CI] Update pipeline Id for origami to monorepo 2025-09-23 11:17:39 -04:00
Peter Park
442d7e4750 Add env var note to vllm.rst for MoE models and fix links in docs (#5415)
* docs(vllm.rst): add performance note for MoE models

* docs: fix links

update vllm readme link 20250521

fix links
2025-09-22 15:58:43 -04:00
Pratik Basyal
a09a8f517e PLDM version for 7.0.0 updated (#5412) 2025-09-22 11:14:07 -04:00
Pratik Basyal
0bbaab645d rocSHMEM and ROCprofiler-SDK highlight update (#5408) (#5409)
* rocSHMEM and ROCprofiler-SDK highlight update (#5408)

* Update RELEASE.md
2025-09-22 10:26:12 -04:00
Ibrahim Wani
4b80405e2e Add set -e to exit when test fails (#5398) 2025-09-19 10:43:35 -06:00
Peter Park
d92e5b6c12 Update Primus Megatron doc v25.8 (#5396)
* megatron: update previous versions list

update

wording

* megatron: update rst and yaml

update primus repo link

update mig guide

* update headings and anchors

* megatron: update doc

* update docker hub urls
2025-09-19 08:09:21 -04:00
Pratik Basyal
91fce2e134 rocpd highlight updated (#5393) 2025-09-18 19:00:36 -04:00
Peter Park
27d53cf082 Remove duplicate ML FW docker image support table (#5389) 2025-09-18 17:06:53 -04:00
Pratik Basyal
bc084246be Reference to AMD GPU Driver 30.10 release notes updated (#5380) 2025-09-18 13:34:46 -05:00
Peter Park
9827ba7ff2 docs: MaxText v25.7 patch update (#5372)
* remove jax 0.6.0 nanoo fp8 caveat note

* reorder maxtext docker images in data sheet
2025-09-17 16:25:46 -04:00
Pratik Basyal
bafda50153 Link updated (#5369) 2025-09-17 15:03:29 -05:00
Pratik Basyal
cae65c6c43 Link reset (#5368) 2025-09-17 13:49:04 -05:00
pbhandar-amd
6a66167486 Merge pull request #5367 from ROCm/amd/pbhandar/rocm_701_internal_to_external_sync
Sync internal to external develop branch for ROCm 7.0.1
2025-09-17 14:26:03 -04:00
Parag Bhandari
0f3543d6e8 Merge branch 'develop-internal' into develop 2025-09-17 14:15:05 -04:00
pbhandar-amd
678691c3d7 Merge pull request #563 from ROCm/amd/pbhandar/rocm_701_external_to_internal_sync
Sync external develop into internal develop for ROCm 7.0.1
2025-09-17 14:14:40 -04:00
pbhandar-amd
5cb3debed9 Merge branch 'develop' into amd/pbhandar/rocm_701_external_to_internal_sync 2025-09-17 14:09:59 -04:00
pbhandar-amd
dd5d710727 Update versions.md 2025-09-17 14:09:49 -04:00
pbhandar-amd
eca1ecde92 Merge branch 'develop' into amd/pbhandar/rocm_701_external_to_internal_sync 2025-09-17 13:48:36 -04:00
pbhandar-amd
ed1e414710 Update versions.md 2025-09-17 13:42:20 -04:00
Pratik Basyal
20c90fc406 Footnote updated (#564) 2025-09-17 12:24:03 -05:00
JeniferC99
6e39614b22 7.0.1 GA update (#5365)
* Update default.xml - Change 7.0.0 to 7.0.1

* add rocm-7.0.1.xml
2025-09-17 13:18:01 -04:00
Pratik Basyal
f7873ac74e Long cell in compatibility matrix updated 701 (#562)
* Long cell updated

* Long cell updated

* Historical comaptibility updated
2025-09-17 11:57:35 -05:00
Parag Bhandari
a86fba556b Merge branch 'develop' into develop-internal 2025-09-17 12:35:50 -04:00
Pratik Basyal
7603fed080 Release 7.0.1 demo release notes (#536)
* Mono repo highlight added

* Leo's feedback incorporated

* Minor wording change

* Randy's feedback incorp

* Update for upcoming change

* Minor feedback added

* Ram's feedback incorporated

* Reworded for clarity

* ROCM 7.0.1 draft

* Minor change

* Release 7.0.0 notes appended

* Heading order updated for 7.0.1

* 700 GA changes synced

* Issue updated

* Review feedback added

* Conf file updated

* Tensorflow change added

* review feedback added

* GPU depencency matrix updated

* Compatibility updated

* Minor change

* New update note

* AMD GPU Driver notes updated

* Footnotes updated
2025-09-17 10:57:15 -05:00
Braden Stefanuk
9932cd4ac2 [hipsparselt] Update compile command for new build system (#5244) 2025-09-16 15:36:20 -06:00
Peter Park
e8d104124f Fix PyTorch training benchmark doc template (#5357)
* fix template

* update wordlist
2025-09-16 17:21:57 -04:00
Peter Park
26f708da87 Add Stable Diffusion XL to PyT training benchmark doc and fix paths in SGLang Disagg Inference doc (#5282)
* add sdxl to pytorch-training

* fix sphinx warnings

fix links

* fix paths in cmds and links in sglang disagg

* fix col width

* update release highlights

* fix

quickfix
2025-09-16 16:49:33 -04:00
Pratik Basyal
5a5e4dbb6e Compatibility updated (#5355) 2025-09-16 15:49:13 -05:00
randyh62
1c3dae75e1 Revert "Update RELEASE.md (#560)" (#561)
This reverts commit f216b371a0.
2025-09-16 13:02:13 -07:00
Peter Park
bab853a0d3 Add NCF to pytorch training benchmark doc (#5352)
* add previous version (25.6)

* fix template

* Formatting and wording fixes

* add caveats

* update yaml

* add note to pytorch-training

* fix template

* make model name shorter
2025-09-16 13:29:28 -04:00
Pratik Basyal
5c7ccb3c26 Github Issue Links updated (#5350)
* 7.0.0 compatibility updated

* GIM link updated
2025-09-16 12:55:58 -04:00
randyh62
f216b371a0 Update RELEASE.md (#560)
Update llvm-project URL
2025-09-16 09:39:26 -07:00
randyh62
37faf170b1 Update RELEASE.md (#5349)
* Update RELEASE.md

update llvm-project URL

* Update .wordlist.txt

add spelling errors
2025-09-16 09:38:23 -07:00
Peter Park
8c40d14d7e fix pldm note (#5346) 2025-09-16 11:09:19 -05:00
Peter Park
d5101532f7 docs: Add SGLang disaggregated P/D inference w/ Mooncake guide (#5335)
* add main content

* Update content and format

add clarification

update

update data

* fix

fix

fix

* fix: deepseek v3

* add ki

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-09-16 10:33:58 -05:00
Peter Park
ef4e7ca1fe docs(PyTorch training v25.8): Add Primus and update PyTorch training benchmark docs (#5331)
* pyt: update previous versions list

update conf.py

* pyt: update yaml and rst

update

update toc

* update headings and anchors

* pyt: update doc

* update docker hub urls
2025-09-16 10:33:53 -05:00
Pratik Basyal
be68246824 Compatibility updated for 7.0.0 (#5332)
* Compatibility udpated

* Minor fix
2025-09-16 10:01:49 -05:00
Pratik Basyal
1626ee4d8b Post GA fixes develop (#5329)
* Develop link updated

* Release notes and compatibilty update

* Compatibilitbity updated

* RPP link updated
2025-09-16 09:30:12 -05:00
Matt Williams
f80044c7db Merge pull request #5326 from ROCm/mattwill-amd-patch-1
Adding AQLprofile link
2025-09-16 08:12:20 -04:00
Pratik Basyal
412f6f2b0e 700 reset link [Develop] (#5325)
* TOC link update and manifest removed

* Link reset

* Changelog synced
2025-09-16 08:07:40 -04:00
Matt Williams
bee7c1223f Update license.md 2025-09-16 08:03:55 -04:00
Pratik Basyal
8af34e2026 700 update pre GA batch1 (#5322)
* Fix PLDM note for ROCm 7.0 (#5320)

* fix pdlm for mi300x

* update debian 12 support note

* 7.0.0 Release notes update Batch 9 (#559)

* Changelog synced

* Compatibilty updated

* Compatibilty update

* Compiler highlight updated

* wordlist updated

---------

Co-authored-by: Peter Park <peter.park@amd.com>
2025-09-16 07:24:54 -04:00
arjun-raj-kuppala
0475650f00 Create rocm-7.0.0.xml (#5321) 2025-09-16 16:45:29 +05:30
Peter Park
cb73e9145a Fix PLDM note for ROCm 7.0 (#5320)
* fix pdlm for mi300x

* update debian 12 support note
2025-09-16 06:09:01 -05:00
Pratik Basyal
7316031fe6 7.0.0 Release notes update Batch 9 (#559)
* Changelog synced

* Compatibilty updated

* Compatibilty update

* Compiler highlight updated

* wordlist updated
2025-09-16 07:03:32 -04:00
Peter Park
76cb264f34 Update vllm-history.rst with missing 0909 entry (#5308) 2025-09-16 06:54:34 -04:00
pbhandar-amd
9c36e44a91 Sync internal 'develop' into external 'develop' for ROCm 7.0 (#5319) 2025-09-16 06:06:57 -04:00
Parag Bhandari
1037f8845a Merge branch 'develop-internal' into develop 2025-09-16 06:01:48 -04:00
pbhandar-amd
c2e31f2d2b Sync external 'develop' branch into internal 'develop' branch (#558) 2025-09-16 06:00:52 -04:00
Yanyao Wang
882f71302a Update default manifest file for ROCm7.0.0 (#5317)
Co-authored-by: Wang, Yanyao <Yanyao.Wang@amd.com>
2025-09-16 14:55:09 +05:30
pbhandar-amd
3d2f10ce0c Merge branch 'develop' into amd/pbhandar/rocm_7_public_internal_sync 2025-09-16 05:20:43 -04:00
pbhandar-amd
81f5314368 Update versions.md 2025-09-16 05:16:12 -04:00
Parag Bhandari
60e3a8107c Merge branch 'develop' into develop-internal 2025-09-16 05:12:42 -04:00
pbhandar-amd
b800801427 Update versions.md 2025-09-16 04:10:31 -04:00
Pratik Basyal
5637deb81e Release notes changes to TF (#556)
* RN changes to TF

* Series capitalized

* Minor update
2025-09-15 17:36:19 -04:00
randyh62
df1ae524b2 Hip minor update (#553)
* Update CHANGELOG.md

Removed duplicate num_threads entry, and added a new Resolved issue from Julia.

* Update RELEASE.md

Removed duplicate num_threads entry and added a resolved issue from Julia.
2025-09-15 14:15:25 -07:00
Pratik Basyal
06fd378036 Known issues updated (#555) 2025-09-15 16:09:07 -05:00
Pratik Basyal
cbd4e8f0ba 7.0.0 release notes feedback updated [Batch 6] (#550)
* RN changes updated

* Changelog synced and release notes updated

* Compatibility changes added
2025-09-15 16:29:34 -04:00
Jeffrey Novotny
b07ae4ba6c Fix links to MIT license for AQLprofile (#5312) 2025-09-15 15:53:29 -04:00
Jeffrey Novotny
2fe270beb3 Fix links to MIT licenses (#5311) 2025-09-15 15:16:17 -04:00
amitkumar-amd
1660ac335a Update RELEASE.md
Swap new framework vs updated framework
2025-09-14 01:50:27 -05:00
amitkumar-amd
b357ba993b Update RELEASE.md 2025-09-14 01:30:49 -05:00
amitkumar-amd
29f4d65da5 Update RELEASE.md 2025-09-14 01:17:53 -05:00
Pratik Basyal
2de5a33aec User space and firmware content added 700 (#542)
* User space and firmware content added

* New updates added

* BKC dep added
2025-09-14 00:58:32 -05:00
Adel Johar
e805e98701 Add key features and known issue for ROCm 7.0 (#421)
Co-authored-by: Istvan Kiss <neon60@gmail.com>
2025-09-13 11:56:58 +02:00
amd-hsivasun
a2785d2b5a Fixed componentName calls for test and build jobs 2025-09-12 12:17:03 -04:00
amd-hsivasun
8882410560 Enabled rocprofiler-systems monorepo 2025-09-12 12:17:03 -04:00
Joseph Macaranas
0af430d1cb [External CI] Another fix for downstream jobs (#5307) 2025-09-11 22:50:56 -04:00
Joseph Macaranas
33bc3c5e2b [External CI] Match component name for ROCR to match expected downstream (#5306) 2025-09-11 22:08:43 -04:00
randyh62
e1a1a4e712 Update RELEASE.md (#540)
* Update RELEASE.md

Added per Julia

* Update CHANGELOG.md

change added to Changelog.md as well
2025-09-11 14:46:12 -07:00
amd-hsivasun
355feae2e2 [Ex CI] Update pipeline Id for rocr-runtime to monorepo 2025-09-11 17:02:58 -04:00
amd-hsivasun
b3c566f6b9 [Ex CI] Update pipeline Id for hip-tests to monorepo 2025-09-11 17:02:50 -04:00
amd-hsivasun
9a3fc8c773 [Ex CI] Update pipeline Id for rocm-smi-lib to monorepo 2025-09-11 16:45:40 -04:00
amd-hsivasun
17be0ce7aa [Ex CI] Update pipeline Id for rocprofiler-sdk to monorepo 2025-09-11 16:41:20 -04:00
amd-hsivasun
c9c41a34c2 [Ex CI] enable hip-tests monorepo 2025-09-11 16:40:36 -04:00
amd-hsivasun
e71b8212f9 Fixed Indentation 2025-09-11 16:23:03 -04:00
amd-hsivasun
8c1df97e34 [Ex CI] Enable rocprofiler-sdk monorepo 2025-09-11 16:23:03 -04:00
amd-hsivasun
957005f596 Updated rocrtst testDir 2025-09-11 16:21:41 -04:00
amd-hsivasun
2383edc1fe Fixed WorkingDir in TestJobs 2025-09-11 16:21:41 -04:00
amd-hsivasun
c4b4abe354 User Test Commit 2025-09-11 16:21:41 -04:00
amd-hsivasun
9b2b1d3a66 User test 2025-09-11 16:21:41 -04:00
Haresh Sivasuntharampillai
8617b653f8 test commit 2025-09-11 16:21:41 -04:00
Haresh Sivasuntharampillai
26ddf7e6ac test commit 2025-09-11 16:21:41 -04:00
Haresh Sivasuntharampillai
91f21d890f Fixed SparseCheckout 2025-09-11 16:21:41 -04:00
Haresh Sivasuntharampillai
a6fbf60594 [Ex CI] enable rocr-runtime monorepo 2025-09-11 16:21:41 -04:00
amd-hsivasun
61f09e2ab9 Update pipelineId for rocprofiler-compute 2025-09-11 15:07:07 -04:00
amd-hsivasun
0d790615ef [Ex CI] Update pipeline Id for rocprofiler-compute to monorepo 2025-09-11 15:07:07 -04:00
Peter Park
7098bdc03b Update vLLM inference benchmark doc for 0909 release (and Sphinx fixes) (#5289) 2025-09-11 15:01:17 -04:00
Peter Park
8eee155585 Mockup: List some bullets horizontally (#539)
* list horizontally

* make it 2 cols

* use grid

* margin -

* update margins
2025-09-11 14:48:47 -04:00
Joseph Macaranas
10f6086819 [External CI] Updates to rocm-libraries pipelines (#5300)
- Add msgpack python module dependency for hipsparselt pipeline.
- Change CMake dirs for rocblas pipeline to allow relative-path access to shared/tensile directory.
2025-09-11 12:53:11 -04:00
amd-hsivasun
964a7cd0b5 fixed component name 2025-09-10 17:31:03 -04:00
amd-hsivasun
d3fe7439cf [Ex CI] enable rocm-smi-lib monorepo 2025-09-10 17:31:03 -04:00
amd-hsivasun
56f566c1dc [Ex CI] update rocminfo pipeline ID to monorepo 2025-09-10 17:24:17 -04:00
Peter Park
e3227d14e6 7.0.0 release notes: Add highlight for training/inference benchmark docker docs (#538)
* add highlight for training/inference benchmark docker docs

* update

update blurb

double word

Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

Update RELEASE.md

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

update wording
2025-09-10 16:45:35 -04:00
Haresh Sivasuntharampillai
88f1493b68 [Ex CI] enable rocminfo monorepo 2025-09-10 16:30:48 -04:00
anisha-amd
3ca9cb1fcc Docs: adding ray and llama.cpp live blog links (#5290) 2025-09-10 15:02:03 -04:00
Peter Park
aebf1b4480 Update amdsmi changelog (#533)
* update amdsmi cl

* remove duplicated changelog entry

* minor tweaks and add upcoming changes

* update
2025-09-10 13:46:58 -04:00
amd-hsivasun
0840c14b6d [Ex CI] update rocm-core pipeline ID to monorepo 2025-09-10 11:58:15 -04:00
amd-hsivasun
daa0184d2e [Ex CI] enable rocm-core monorepo 2025-09-10 11:47:12 -04:00
Pratik Basyal
3b5019e03f Minor correction (#5285) 2025-09-10 10:53:25 -04:00
Pratik Basyal
68f505e375 Taichi removed (#5283) 2025-09-10 10:07:55 -04:00
Peter Park
05a66f75fe add qwen3 30b a3b to vllm-benchmark-models (#5280) 2025-09-09 17:41:11 -04:00
Ibrahim Wani
3c37ae88f0 Add origami CI pipelines (#5256)
* Add origami yaml pipeline.

* Unindent lines.

* Add cmake dependency step to origami yml.

* Add pybind dep

* Fix pipeline failures.

* Quick fix

* Fix pybind11 dep for almalinux

* Fix pybind11 dep for almalinux again

* Test

* [Ex CI] don't create symlink if more than one sparse checkout dir

* hipBLASLt multi sparse

* Replace pybind with nanobind.

* Quick fix

* Testing nanobind install in pipelines

* Run origami binding tests

* Change build path for tests

* Change build path for tests again

* Add missing dep for CI

* Add archs to buildJobs

* Fix CI error.

* Test

* Test job target

* Adding job target to hipblaslt dependant builds

* Check devices on machine

* Add gpu to pipeline

* Add more gpu targets

* test

* Add test job to origami

* Update test jobs

* Finding test dir

* Fix sparse checkout

* Find build dir

* Try to find build dir

* Clean up

* Test

* Change test dir

* Build origami in test job

* Try removing job.target from params

* Package bindings in build artifacts

* Download build as artifact.

* Comment out block

* Fix checkout in test job

* Test1

* Echo to list dir

* Sparse checkout origami/python

* Download python bindings as artifact

* Try ctest instead of running test files directly

* Only download artifacts for ubuntu

* Add missing cd

* Run individual tests not ctest.

* Fix hipblaslt build failures

* Resolve more ci failures in hipblaslt

* Add old changes back in

* Fix hipblaslt ci errors

* Clean up

* Add nanobind to array

* Add nanobind to array correctly

* Remove nanobind install script

* Quick fix

* Add pip module installs to test job

---------

Co-authored-by: Daniel Su <danielsu@amd.com>
2025-09-09 15:13:54 -06:00
amd-hsivasun
985786e98d Add sqlalchemy to dependencies in rocprofiler-compute 2025-09-09 15:27:56 -04:00
amd-hsivasun
f25e27acf0 Update roctracer pipeline ID and branch 2025-09-09 14:13:56 -04:00
Pratik Basyal
519364179c Mono repo highlight and known issues feedback added (#532)
* Mono repo highlight added

* Leo's feedback incorporated

* Minor wording change

* Randy's feedback incorp

* Update for upcoming change

* Minor feedback added

* Ram's feedback incorporated

* Reworded for clarity

* Minor update

* Minor update
2025-09-09 11:40:26 -04:00
anisha-amd
db43d18c37 Docs: frameworks compatibility- ray and llama.cpp (#5273) 2025-09-09 11:02:30 -04:00
Peter Park
4f53183696 docs: Add JAX MaxText benchmark v25.7 (#5182)
* Update previous versions

* Add data file

* fix filename and anchors

* add templates

* update .wordlist.txt

* Update template and data

add missing step

fix fmt

* update template

* fix data

* add jax 0.6.0

* update history

* update quantized training note
2025-09-08 21:42:56 -04:00
Joseph Macaranas
94476f34ca [External CI] Add amdgpu deps to rocpydecode pipeline (#5267) 2025-09-08 11:32:10 -04:00
Peter Park
4bc1bf00c6 Update PyTorch training benchmark docker doc to 25.7 (#5255)
* Update PyTorch training benchmark docker doc to 25.7

* update .wordlist.txt

* update conf.py

* update data sheet

* fix sphinx warnings
2025-09-05 12:07:51 -04:00
Matt Williams
76fd6b2290 Updating broken link (#5258) 2025-09-05 11:45:06 -04:00
Joseph Macaranas
e5345a9cca External CI: rocdecode downstream builds (#5254)
- Trigger downstream build of rocpydecode within rocdecode pipelines.
- Copying similar variables as other pipelines even though these projects are not in the super-repos.
2025-09-05 10:12:39 -04:00
Pratik Basyal
c2080a90c7 Changelog editorial fix ROCm 700 (#534)
* Changelog editorial fix

* Changelog synced
2025-09-05 09:07:51 -04:00
David Dixon
2f40189575 add catch2 (#5257) 2025-09-04 18:48:34 -06:00
David Dixon
9e1a82d327 Add libdivide (#5252) 2025-09-03 20:11:38 -06:00
Joseph Macaranas
3aab9e1bc5 Modify sparseCheckoutDirectories in checkout.yml (#5251)
Added 'shared' to sparseCheckoutDirectories parameter.
2025-09-03 16:58:17 -04:00
David Dixon
2b0ce5e5c2 Fix typo (#5250) 2025-09-03 13:59:41 -06:00
David Dixon
f1be2d291a Add fmtlib version that works with spdlog (#5249) 2025-09-03 13:26:18 -06:00
amd-hsivasun
07cb61f969 Update testjob dependsOn 2025-09-03 14:02:47 -04:00
amd-hsivasun
c486c39b50 Update rocprofiler-compute.yml
Reverted Component name and updated job names
2025-09-03 14:02:47 -04:00
amd-hsivasun
e68d9e9ce2 Update rocprofiler-compute.yml 2025-09-03 14:02:47 -04:00
amd-hsivasun
bff5c4a955 Fixed sparseCheckoutDir 2025-09-03 14:02:47 -04:00
amd-hsivasun
b0abc43c46 Added sparseCheckout to testjob template 2025-09-03 14:02:47 -04:00
amd-hsivasun
ceabccad83 Fixed componentName 2025-09-03 14:02:47 -04:00
amd-hsivasun
2628812fc4 [Ex CI] Enable rocprofiler-compute monorepo 2025-09-03 14:02:47 -04:00
amd-hsivasun
df3ea80290 Enable Roctracer Monorepo 2025-09-03 14:02:20 -04:00
David Dixon
b6647dfb22 Add spdlog source builds (#5247) 2025-09-03 11:35:53 -06:00
randyh62
08dad2dc41 Update RELEASE.md (#531)
Remove fine-grained system memory pool from HIP Highlights
2025-09-02 13:34:02 -07:00
David Dixon
c34fddb26a Add boost deps (#5235) 2025-09-02 13:28:19 -06:00
Pratik Basyal
b4c5980a96 Update to 7.0.0 RN and Compatibility matrix (#530)
* Fixes applied

* Tutorial HUB update added
2025-08-28 17:52:39 -04:00
Pratik Basyal
52ce201401 ROCm 7.0.0 Known issues [Batch2] (#529)
* Known issues added

* SME feedback added
2025-08-28 16:50:55 -04:00
Swati Rawat
505233473d Merge pull request #506 from SwRaw/swraw/docs
Create mi355-performance-counters.rst
2025-08-28 20:58:40 +05:30
Swati Rawat
4f4f4556a5 Merge branch 'develop' into swraw/docs 2025-08-28 20:48:33 +05:30
srawat
4f8426376b Update gpu-arch.md 2025-08-28 20:43:10 +05:30
Istvan Kiss
d476d09aff Update precision support page with missing libraries and RDNA2 and CDNA4 support 2025-08-28 17:09:34 +02:00
Adel Johar
04beef8773 Docs: Overhaul JAX compatibility page for ROCm 7.0 2025-08-28 17:08:27 +02:00
srawat
95d1752874 Update _toc.yml.in 2025-08-28 20:35:01 +05:30
srawat
eabf72c2db Update _toc.yml.in 2025-08-28 20:28:34 +05:30
Pratik Basyal
53bd9b5da4 Table loading and broken link fixed in 7.0.0 (#528)
* Indentation and formatting updated

* Table and broken link fixed

* Clang-ocl removed
2025-08-28 10:52:03 -04:00
Pratik Basyal
0665e73e2d 700 known Issues update [Batch1] (#527)
* Indentation and formatting updated

* Known issues added

* Known issues udpated

* Minor change

* Known issues updated

* KMD UMD udpate

* Updated known issues

* Additional text removed from known issues

* Oracle linux 10 removed
2025-08-28 09:50:57 -04:00
srawat
264d353071 Merge branch 'swraw/docs' of https://github.com/SwRaw/ROCm-internal into swraw/docs 2025-08-28 19:05:40 +05:30
srawat
d58e2b16db Update mi350-performance-counters.rst 2025-08-28 19:05:00 +05:30
Pratik Basyal
010a191938 700 RN update Batch 4 (#526)
* Indentation and formatting updated

* Resolved issue for kokkos option added

* Known issue for ROCr added

* 2nd known issue added

* Known issues updated

* adding 2 known issues

* Apply suggestions from code review

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update RELEASE.md

* Known issues added

* Approved known issue added

* Component removed based on Leo's feedback

* Issue link added

---------

Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com>
Co-authored-by: Matt Williams <matt.williams@amd.com>
2025-08-27 14:22:04 -04:00
Daniel Su
977e9c2295 [Ex CI] change hip-clr pipeline ID (#5230) 2025-08-27 13:06:08 -04:00
Daniel Su
eac9772fff [Ex CI] add temporary downstream path from rocBLAS to hipBLAS (#5184) 2025-08-27 13:05:51 -04:00
Daniel Su
151a4bd7bc [Ex CI] add retries to potentially flaky steps (#5175) 2025-08-27 13:05:26 -04:00
Daniel Su
9d28684161 [Ex CI] enable clr/hip/hipother monorepo builds (#5217) 2025-08-27 10:43:07 -04:00
randyh62
a7edb17538 Fix hip7 rn (#523)
* Update RELEASE.md

Update per LRT meeting notes

* Update RELEASE.md

move warpSize change as requested

* Update RELEASE.md

update warpSize change wording.

* Update RELEASE.md

* Update RELEASE.md

Why either?

* Update RELEASE.md

Add content from HIP 7 Changelog

* Update RELEASE.md

looks good

* Update RELEASE.md

Co-authored-by: Julia Jiang <56359287+jujiang-del@users.noreply.github.com>

---------

Co-authored-by: Julia Jiang <56359287+jujiang-del@users.noreply.github.com>
2025-08-26 16:02:49 -07:00
Braden Stefanuk
9ea9b33d14 [superbuild] Configure pipeline (#5221) 2025-08-26 15:12:19 -06:00
Pratik Basyal
59afdef1fb ONNX version 1.22.0 updated ROCm 7.0.0 (#524)
* Indentation and formatting updated

* ONNX v 1.22.0 udpated
2025-08-26 16:52:56 -04:00
Pratik Basyal
ea8ff1b17d UCC and UCX version and release notes update for 7.0.0 (#521)
* Indentation and formatting updated

* UCC and UCX version udpated

* ROCm bandwidth test update

* MI350 series info added

* Changelog update

* ROCm systems Profiler highlight updated

* Redundant removed, pulled out from HIP changelog

* Known issues to Compute profiler added

* ONNX compatibility updtaed

* ROCm COmpute Profiler highlight added

* RN update

* ROCm 700 stack image updated

* ROCM Compute and System highlight updated

* Deep learning frameworks added

* removed BF16 support for MIGraphX -- already in 6.4 release notes; removed FP4 MIGraphX support

* ROCm Compute profiler highlight updated

* Formatting update

* AI framework update

* ROCm Systems Profiler udpate

* removed mention of CentOS of CentOS

* ROCm Compute Profiler update

* Feedback changes

* leo's feedback incorporated

* ampersand

* Changelog synced

* Changelog synced

* RHEL 10 removed

* Rocky Linux updated

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
2025-08-26 16:34:27 -04:00
Swati Rawat
808a7709aa Merge branch 'develop' into swraw/docs 2025-08-26 20:32:46 +05:30
srawat
8cc17e307c review comments 2025-08-26 18:22:35 +05:30
srawat
7fd6146b16 Update mi355-performance-counters.rst 2025-08-22 23:16:18 +05:30
srawat
e839054e56 Update mi355-performance-counters.rst 2025-08-22 22:31:49 +05:30
Matt Williams
1d42f7cc62 Deep learning frameworks edits for scale (#5189)
* Deep learning frameworks edits for scale

Based on https://ontrack-internal.amd.com/browse/ROCDOC-1809

* update table

table

* leo comments

* formatting

* format

* update table based on feedback

* header

* Update machine learning page

* headers

* Apply suggestions from code review

Co-authored-by: anisha-amd <anisha.sankar@amd.com>

* Update .wordlist.txt

* formatting

* Update docs/how-to/deep-learning-rocm.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

---------

Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com>
Co-authored-by: anisha-amd <anisha.sankar@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-08-22 11:46:07 -04:00
Pratik Basyal
78c4a4c12a Post RC4 700 RN update [Batch 3] (#520)
* Indentation and formatting updated

* OS support changes

* Historical compatibility updated

* Minor update
2025-08-22 11:30:37 -04:00
srawat
c587d75701 listing in TOC 2025-08-22 19:57:27 +05:30
srawat
a88151f505 Update mi355-performance-counters.rst 2025-08-22 14:59:59 +05:30
Peter Park
98029db4ee docs: Add Primus (Megatron) training Docker documentation (#5218) 2025-08-21 23:50:55 -04:00
Pratik Basyal
ff7d9eb17a Post RC4 7.0.0 release notes update [Batch 2] (#519)
* Indentation and formatting updated

* Compatibility updated

* OS support updated

* Changelog synced

* AMD SMI link updated

* Broken links fixed

* Changelog synced
2025-08-21 21:34:11 -04:00
Pratik Basyal
2ec8757ffa Post RC4 RN 700 update (#513)
* Indentation and formatting updated

* Rc4 compute profiler version update

* Editorial changes in changelog

* Changelog and compatibility matrix updated

* ROCProfiler-SDK highlight update

* az and ol added to wordlist

* updated with newer info fr from migraphx

* fixed a formatting error

* Release date updated

* ROCProfiler-SDK highlight updated

* Changelog update

* Changelog update

* Release notes feedback

* Release notes update

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
2025-08-21 18:51:57 -04:00
Matt Williams
28c3384433 Merge pull request #518 from ROCm/license-update
Updating license for AQLprofile
2025-08-21 18:04:45 -04:00
Matt Williams
91c26c502d Updating license for AQLprofile 2025-08-21 18:02:31 -04:00
Pratik Basyal
0ae99ea21e Indentation and formatting updated (#517) 2025-08-21 16:02:27 -04:00
Jeffrey Novotny
60571680b5 Second round of proofreading for components in 7.0 release notes (#514)
* Second round of proofreading for components

* Remove duplicate item

---------

Co-authored-by: Pratik Basyal <prbasyal@amd.com>
2025-08-21 14:17:17 -04:00
Matt Williams
65ebbaa117 Merge pull request #5113 from ROCm/aqlprofile
AQLProfile component additions
2025-08-21 12:53:16 -04:00
yugang-amd
e24bd407c1 edit release notes (#516)
Co-authored-by: Pratik Basyal <prbasyal@amd.com>
2025-08-21 11:58:26 -04:00
spolifroni-amd
19156cf2c6 adding roccv to rocm (#479)
* adding-roccv

* removed rocCV

---------

Co-authored-by: Pratik Basyal <prbasyal@amd.com>
2025-08-21 11:30:12 -04:00
randyh62
0d5f17a58b Update RELEASE.md (#515)
* Update RELEASE.md

Add logical reduction changes to ROCm 7.0 Release Notes

* Update RELEASE.md

Added description of DebugFission option for llvm-project

* Update RELEASE.md

update definition of __builtin_amdgcn_is_invocable

* Update RELEASE.md

Removed Perl Scripts from HIPCC
2025-08-21 06:18:35 -07:00
Peter Park
6b93d7a75a Update amdsmi changelog for 7.0 (#510)
Co-authored-by: Pratik Basyal <prbasyal@amd.com>
2025-08-20 15:29:20 -04:00
Pratik Basyal
acdb5c90a6 PRE RC4 7.0.0 RN Update (#507)
* Indentation and formatting updated

* Feedback changes and AQLprofiler addition

* AQL Profiler update

* MIgraphx changelog added

* Release highlight added

* Indentation fixed

* Highlights updated

* Highlights changes

* Leo quick review feedback added

* Leo's review feedback added

* Leo's feedback incorporated

* Consolidated changelog synced

* OS virtualization link updated

* ROCm Bandwidth test added

* Changelog.md sycned
2025-08-20 14:50:24 -04:00
randyh62
073ac54e47 Llvm rn update (#511)
* Update RELEASE.md

Added LLVM Release Notes content

* Update RELEASE.md

minor formatting edits

* Update RELEASE.md

updated CUDA version
2025-08-20 14:26:28 -04:00
Joseph Macaranas
3dfc0cdbf1 [External CI] Update CMake on MIOpen build pipeline (#5210) 2025-08-20 15:37:15 +00:00
Swati Rawat
d0377dd947 Merge branch 'develop' into swraw/docs 2025-08-20 18:56:46 +05:30
srawat
35ec186cd9 spellcheck 2025-08-20 17:19:28 +05:30
srawat
da340c3d05 spellcheck 2025-08-20 17:06:02 +05:30
randyh62
1d127d987b Update RELEASE.md (#508)
* Update RELEASE.md

Added ROCR Runtime

* Update RELEASE.md

Removed Resolved Issue from HIP

* Update RELEASE.md

fix a few bad words
2025-08-19 12:51:09 -07:00
Daniel Su
00b0d9430e [Ex CI] change rocprofiler's branch to develop (#5208) 2025-08-19 15:44:07 -04:00
Daniel Su
14acec6000 [Ex CI] switch rocprofiler pipeline ID (#5207) 2025-08-19 15:22:02 -04:00
randyh62
71bc63d2d8 Update RELEASE.md (#505)
* Update RELEASE.md

Updated with Changelog info from Julia

* Update RELEASE.md

* Update RELEASE.md

* Update RELEASE.md
2025-08-19 10:53:36 -07:00
srawat
7b087769a2 Create mi355-performance-counters.rst 2025-08-19 20:40:28 +05:30
Pratik Basyal
08d0840b69 Post RC3 7.0.0 RN update (#501)
* Indentation and formatting updated

* AMD SMI changelog update

* Changelog update

* Compute and Systems profiler changelog added

* Highlight added

* AMD SMI link added

* Changelog updated

* Refernece link updated

* ROCal changelog added

* rocJpeg added

* Minor change

* version update

* rocpydecode added

* Changelog.md updated

* Heading level error fixed

* Feedback from Jeff incorporated

* Title formatting updated

* Changelog updated

* Changelog updated

* Changelog updates

* HIPCC perl script removed

* TOC for internal purpose updated

* ROCgdb api and ROCdbg added

* Changelog udpate

* Sandra's feedback added
2025-08-18 14:03:43 -04:00
Peter Park
c154b7e0a3 Fix documented VRAM for Radeon AI Pro R9700 (#5203) 2025-08-18 10:00:10 -04:00
Istvan Kiss
ae734e7846 Add MI350X and MI355X to atomics operation page (#497)
Add MI350X and MI355X to atomics operation page
2025-08-18 15:37:19 +02:00
David Dixon
9f5cd4500c Don't use local tensilelite (#5201) 2025-08-18 06:19:27 -06:00
Jan Stephan
51e7d9550f Make documentation build platform-independent (#5052)
Make documentation build platform-independent
2025-08-18 10:59:31 +02:00
Peter Park
55d0a88ec5 vLLM inference benchmark doc: add missing data field (#5199) 2025-08-15 13:20:39 -04:00
Pratik Basyal
67f988f58b 7.0.0 release changes to ROCm documentation (#483)
* Update RELEASE.md (#481)

Update HIP 7.0 Release Notes

* Initial 7.0.0 related changes

* Update RELEASE.md

Add Release Notes entry for `__reduce_XXX_sync` functions in HIP.

* Update RELEASE.md

Add HIP 7 API changes to Release Highlights

* Update RELEASE.md

Corect link for HIP 7 changes

* Update RELEASE.md

Update Release Highlights note for HIP 7 changes

* Changelog entry updated post RC2

* 642 GA manifest added

* 6.4.3 GA manifest added

* 7.0.0 RC1 manifest added

* added rocCV (#490)

Co-authored-by: Pratik Basyal <prbasyal@amd.com>

* 7.0.0 RC2 manifest added

* Documentation updated added

* Highlight for 7.0.0 added

* Highlight updated

* Highlights update

* removed rocCV (#499)

Co-authored-by: Pratik Basyal <prbasyal@amd.com>

* Version udpate

* Version table update

* Installer udpate added

* Table updated

---------

Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-08-15 13:17:41 -04:00
Peter Park
7ee22790ce docs: Update vLLM benchmark doc for 20250812 Docker release (#5196) 2025-08-14 15:43:36 -04:00
Daniel Su
ec05312de7 [Ex CI] enable rocprofiler monorepo (#5197)
* [Ex CI] enable rocprofiler monorepo

* set ROCM_PATH
2025-08-14 14:31:34 -04:00
amd-hsivasun
39e7ccd3c5 Update variables-global.yml 2025-08-13 17:27:05 -04:00
dependabot[bot]
c4135ab541 Bump sphinx-sitemap from 2.7.2 to 2.8.0 in /docs/sphinx (#5192)
Bumps [sphinx-sitemap](https://github.com/jdillard/sphinx-sitemap) from 2.7.2 to 2.8.0.
- [Release notes](https://github.com/jdillard/sphinx-sitemap/releases)
- [Changelog](https://github.com/jdillard/sphinx-sitemap/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/jdillard/sphinx-sitemap/compare/v2.7.2...v2.8.0)

---
updated-dependencies:
- dependency-name: sphinx-sitemap
  dependency-version: 2.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-13 09:22:31 -06:00
anisha-amd
dd56fd4d3a develop: compatibility matrix frameworks support update (#5185) 2025-08-12 14:25:37 -04:00
Peter Park
80f7dc79b9 Add Hunyuan Video to PyTorch inference benchmark models doc (#5094) 2025-08-12 11:54:59 -04:00
David Dixon
231aa0bfc6 Merge pull request #5120 from ROCm/users/ellosel/hipblaslt-lapack-deps
Add deps and config options for new hipblaslt build system
2025-08-11 13:32:09 -06:00
Joseph Macaranas
8655fb369a [External CI] Full checkout of rocm-libraries for hipsparselt pipeline (#5178) 2025-08-11 10:31:40 -04:00
Dominic Widdows
306b39ea5e Merge pull request #5174 from ROCm/dwiddows-patch-1
Fix hyperlink syntax
2025-08-08 11:23:09 -07:00
Dominic Widdows
9e055d92ce Fix hyperlink syntax 2025-08-08 10:28:09 -07:00
Daniel Su
85b13c0513 [Ex CI] temporarily disable high pool (#5173) 2025-08-08 11:10:04 -04:00
pbhandar-amd
dba913095a Merge pull request #5168 from ROCm/amd/pbhandar/manifest_700
Update XML for 6.4.3
2025-08-08 10:51:03 -04:00
Daniel Su
81b9d50c2c [Ex CI] retry MIOpen CK download if unzip fails (#5163) 2025-08-08 10:37:05 -04:00
David Dixon
e9bb2fca36 Remove build dir artifact creation 2025-08-08 14:26:12 +00:00
David Dixon
16e96caf80 Restore commented code 2025-08-08 14:26:12 +00:00
David Dixon
7e0efaa6b0 build all kernels 2025-08-08 14:25:43 +00:00
Daniel Su
af4f291005 Compress and upload build files 2025-08-08 14:25:43 +00:00
David Dixon
b9218832bc Update hipBLASLt.yml 2025-08-08 14:25:43 +00:00
David Dixon
3f2c1d65eb only run one test 2025-08-08 14:25:43 +00:00
David Dixon
ee4287fdd7 parallellize lapack build 2025-08-08 14:25:43 +00:00
David Dixon
d63db0be41 debug commit 2025-08-08 14:25:43 +00:00
David Dixon
6a37323fe7 Enable rocroller and use fetch content 2025-08-08 14:24:44 +00:00
David Dixon
b6b7b32e6d Disable blis for new build system 2025-08-08 14:22:13 +00:00
David Dixon
7c11126938 Fix pip args 2025-08-08 14:22:13 +00:00
David Dixon
ac0b72497e add python deps for hipblaslt 2025-08-08 14:22:13 +00:00
David Dixon
68bc7f83da Need both target options while transitioning between build systems 2025-08-08 14:22:13 +00:00
David Dixon
5bbe8ecdcc add deps install back 2025-08-08 14:22:13 +00:00
Daniel Su
6bc408d051 Change to GPU_TARGETS 2025-08-08 14:22:13 +00:00
Daniel Su
20762b9a96 Add blas and lapack to dnf map 2025-08-08 14:22:13 +00:00
David Dixon
fa5395a1a6 Drop lapack install script 2025-08-08 14:22:13 +00:00
Joseph Macaranas
254d863b91 External CI: Temporary Pipeline Change for CMake Refactor (#5166)
- Disable gfx1030 builds temporarily for blas, sparse, and solvers.
- TODO: gfx1030 build path should have separate build flags to use rocblas path.
2025-08-08 10:14:28 -04:00
Parag Bhandari
03bf20e614 Update XML for 6.4.3 2025-08-08 09:10:42 -04:00
Pratik Basyal
af48464844 6.4.3 version updated (#492) 2025-08-08 08:45:39 -04:00
pbhandar-amd
5b724a3780 Merge pull request #5167 syncing internal develop into external for ROCm 7.0
Syncing internal develop into external for ROCm 7.0 release.
2025-08-08 07:17:50 -04:00
Parag Bhandari
ffd5575cd9 Merge branch 'develop-internal' into amd/pbhandar/internal_to_external_700 2025-08-08 05:35:20 -04:00
pbhandar-amd
cfb7bd1883 Update versions.md 2025-08-07 16:36:12 -04:00
pbhandar-amd
ae7b791b22 Update versions.md 2025-08-07 16:34:26 -04:00
Daniel Su
3573239728 [Ex CI] update rocprofiler-register branch name (#5162) 2025-08-07 11:46:44 -04:00
Daniel Su
ec566f9623 [Ex CI] update rocprofiler-register pipeline ID to monorepo (#5158) 2025-08-07 10:35:14 -04:00
Pratik Basyal
30a862c4b9 PRE GA links reset to public (#491) 2025-08-07 10:23:34 -04:00
Parag Bhandari
3d3cfae976 Merge branch 'develop' into develop-internal 2025-08-07 09:42:50 -04:00
Daniel Su
14f5316ade [Ex CI] enable rocprofiler-register monorepo builds (#5155) 2025-08-06 13:44:55 -04:00
Pratik Basyal
00d814ccbf ROCm SMI 7.7.0 update included in 6.4.3 (#488)
* ROCm SMI 7.7.0 update included

* Feedback incorporated

* Review feedback added
2025-08-06 13:26:59 -04:00
Parag Bhandari
948a6a469b Merge branch 'develop' into develop-internal 2025-08-06 12:00:31 -04:00
Dominic Widdows
698d7f1d58 Updating old link that has been changed (#5149) 2025-08-05 15:23:55 -04:00
Dominic Widdows
9ab5dcfa59 Remove reference to build instructions that were moved 2025-08-05 15:22:32 -04:00
Pratik Basyal
8ba712bff3 Ram's review feedback incorporated (#487) 2025-08-05 11:12:35 -04:00
Daniel Su
e91e712888 [Ex CI] make MIOpen CK script no longer partially succeed (#5141) 2025-08-02 14:42:12 -04:00
Joseph Macaranas
8f1b075a79 [External CI] Disable downstream solver builds (#5150)
- Disable while migration to monorepo is postponed.
2025-08-02 14:41:27 -04:00
Pratik Basyal
f0cc7c573d No changes related updated added (#486) 2025-08-01 14:38:14 -04:00
Pratik Basyal
b271c9af9d 6.4.3 Release changes to ROCm documentation (#482)
* Initial changes to 6.4.3 added

* Additional changes to RN

* Deep learning framework updated added

* Tutorials for AI dev added

* Install on linux toc pointed to internal

* 6.4.3 compatibility table updated

* Datatype support docs improvement added

* Manifest 642 GA added

* 643 RC1 manifest added

* 6.4.2 version table added

* 6.4.3 version table added

* Leo's feedback incorporated

* Deeplearning framework support synced

* Historical Compatibility matrix updated

* Release highlight added

* DGL compatibility updated
2025-08-01 14:21:05 -04:00
Daniel Su
885ab8438a [Ex CI] reduce pipeline size (#5140)
* new fft miopen pipeline ids
* remove all references to mainline builds
2025-08-01 11:54:59 -04:00
anisha-amd
3837fe8440 Updates to the compatibility matrix with DGL fix (#5143) 2025-08-01 11:17:34 -04:00
anisha-amd
98530811b4 Update megablocks-compatibility.rst (#5136) 2025-07-31 13:30:39 -04:00
anisha-amd
266387d816 Docs: Adding frameworks compatibility for Megablocks and Taichi (#5133) 2025-07-31 13:00:31 -04:00
Daniel Su
2e93925311 [Ex CI] disable rocSPARSE to hipSOLVER downstream path (#5134) 2025-07-31 12:42:04 -04:00
Matt Williams
9786a75390 Update license 2025-07-31 10:33:36 -04:00
Daniel Su
88c2a2877b [Ex CI] enable hipSOLVER monorepo builds (#5119)
For ROCm/rocm-libraries#942

Enables hipSOLVER monorepo builds.

Enables downstream paths:
rocSOLVER -> hipSOLVER
rocSPARSE -> hipSOLVER

Sample runs:
hipSOLVER: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=40959&view=results
rocSOLVER: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=40948&view=results
rocSPARSE: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=40949&view=results
2025-07-31 10:31:55 -04:00
Daniel Su
85e0580b28 [Ex CI] enable FFT downstream jobs (#5126)
Monorepo support for FFTs was already implemented and trigger files already exist, so just need to enable their downstream jobs.

Enables downstream path:
hipRAND -> rocFFT -> hipFFT

Sample runs:
hipRAND: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=41270&view=results
rocFFT: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=41268&view=results
hipFFT: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=41269&view=results
2025-07-31 10:31:35 -04:00
Peter Park
b61d6a021e Update PyT and TF Docker images in compatibility pages for 6.4.2 (#5129) 2025-07-31 09:55:46 -04:00
Istvan Kiss
fb30dafa29 Update precision support page part I. (#5127) 2025-07-31 15:22:19 +02:00
Matt Williams
95543cae2a Final edits 2025-07-30 14:43:52 -04:00
Matt Williams
1cf3eef9da AQLProfile component additions 2025-07-28 14:39:39 -04:00
Jan Stephan
3c71bb25e8 Make initial directory and copy operations platform-independent 2025-07-16 15:13:13 +02:00
252 changed files with 11006 additions and 37714 deletions

View File

@@ -1,42 +0,0 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
resources:
repositories:
- repository: aomp_repo
type: github
endpoint: ROCm
name: ROCm/aomp
ref: amd-mainline
- repository: aomp-extras_repo
type: github
endpoint: ROCm
name: ROCm/aomp-extras
ref: amd-mainline
- repository: flang_repo
type: github
endpoint: ROCm
name: ROCm/flang
ref: amd-mainline
- repository: llvm-project_repo
type: github
endpoint: ROCm
name: ROCm/llvm-project
ref: amd-mainline
pipelines:
- pipeline: rocr-runtime_pipeline
source: \ROCR-Runtime
trigger:
branches:
include:
- amd-mainline
# this job will only be triggered after successful build sequence of llvm-project and ROCR-Runtime
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_COMPONENT_PATH }}/aomp.yml
parameters:
checkoutRepo: aomp_repo

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: hip_clr_combined
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -35,93 +54,24 @@ parameters:
type: object
default:
- llvm-project
# hip and clr are tightly-coupled
# run this same template for both repos
# any changes for clr should just trigger HIP pipeline
# similarly for hipother repo, for Nvidia backend
- ROCR-Runtime
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt }
- { os: almalinux8, packageManager: dnf }
- { os: ubuntu2204, packageManager: apt, platform: amd }
- { os: ubuntu2204, packageManager: apt, platform: nvidia }
- { os: almalinux8, packageManager: dnf, platform: amd }
- { os: almalinux8, packageManager: dnf, platform: nvidia }
# HIP with AMD backend
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hip_clr_combined_${{ job.os }}_amd
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
variables:
- group: common
- template: /.azuredevops/variables-global.yml
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
# checkout triggering repo (either HIP or clr)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
# if this is triggered by HIP repo, matching repo is clr
# if this is triggered by clr repo, matching repo is HIP
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: matching_repo
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: hipother_repo
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependenciesAMD }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
os: ${{ job.os }}
# compile clr
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: clr
cmakeBuildDir: '$(Build.SourcesDirectory)/clr/build'
cmakeSourceDir: '$(Build.SourcesDirectory)/clr'
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
-DHIP_COMMON_DIR=$(Build.SourcesDirectory)/HIP
-DHIP_PLATFORM=amd
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DHIPCC_BIN_DIR=$(Agent.BuildDirectory)/rocm/bin
-DCLR_BUILD_HIP=ON
-DCLR_BUILD_OCL=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
artifactName: amd
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: amd
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
# parameters:
# aptPackages: ${{ parameters.aptPackages }}
# pipModules: ${{ parameters.pipModules }}
# environment: amd
# HIP with Nvidia backend
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hip_clr_combined_${{ job.os }}_nvidia
- job: ${{ parameters.componentName }}_${{ job.os }}_${{ job.platform }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
@@ -140,49 +90,45 @@ jobs:
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
# checkout triggering repo (either HIP or clr)
# full checkout of rocm-systems superrepo, we need clr, hip, and hipother
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
# if this is triggered by HIP repo, matching repo is clr
# if this is triggered by clr repo, matching repo is HIP
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: matching_repo
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: hipother_repo
# sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependenciesNvidia }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
os: ${{ job.os }}
- script: 'ls -1R $(Agent.BuildDirectory)/rocm'
displayName: 'Artifact listing'
# compile clr
${{ if eq(job.platform, 'amd') }}:
dependencyList: ${{ parameters.rocmDependenciesAMD }}
${{ elseif eq(job.platform, 'nvidia') }}:
dependencyList: ${{ parameters.rocmDependenciesNvidia }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: clr
cmakeBuildDir: '$(Build.SourcesDirectory)/clr/build'
cmakeSourceDir: '$(Build.SourcesDirectory)/clr'
cmakeBuildDir: $(Agent.BuildDirectory)/s/projects/clr/build
cmakeSourceDir: $(Agent.BuildDirectory)/s/projects/clr
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
-DHIP_COMMON_DIR=$(Build.SourcesDirectory)/HIP
-DHIP_PLATFORM=nvidia
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DHIPCC_BIN_DIR=$(Agent.BuildDirectory)/rocm/bin
-DHIP_COMMON_DIR=$(Agent.BuildDirectory)/s/projects/hip
-DHIPNV_DIR=$(Agent.BuildDirectory)/s/projects/hipother/hipnv
-DHIP_PLATFORM=${{ job.platform }}
-DCLR_BUILD_HIP=ON
-DCLR_BUILD_OCL=OFF
-DHIPNV_DIR=$(Build.SourcesDirectory)/hipother/hipnv
-DCLR_BUILD_OCL=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
artifactName: ${{ job.platform }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: nvidia
artifactName: ${{ job.platform }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
# parameters:
# aptPackages: ${{ parameters.aptPackages }}
# pipModules: ${{ parameters.pipModules }}
# environment: nvidia

View File

@@ -79,7 +79,7 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- task: Bash@3
displayName: Add lit to PATH
inputs:

View File

@@ -123,7 +123,7 @@ jobs:
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
pool: ${{ variables.HIGH_BUILD_POOL }}
pool: ${{ variables.MEDIUM_BUILD_POOL }}
workspace:
clean: all
steps:
@@ -131,6 +131,7 @@ jobs:
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -149,6 +150,7 @@ jobs:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Build and install other dependencies
retryCountOnTaskFailure: 3
inputs:
targetType: inline
workingDirectory: $(Agent.BuildDirectory)/s
@@ -210,6 +212,7 @@ jobs:
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -228,6 +231,7 @@ jobs:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Build and install other dependencies
retryCountOnTaskFailure: 3
inputs:
targetType: inline
workingDirectory: $(Agent.BuildDirectory)/s

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: ROCR-Runtime
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -45,6 +64,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ROCR_Runtime_build_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
@@ -65,14 +88,18 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
@@ -82,105 +109,112 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
# parameters:
# aptPackages: ${{ parameters.aptPackages }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ROCR_Runtime_test_${{ job.os }}_${{ job.target }}
dependsOn: ROCR_Runtime_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- task: Bash@3
displayName: Build kfdtest
inputs:
targetType: 'inline'
workingDirectory: $(Build.SourcesDirectory)/libhsakmt/tests/kfdtest
script: |
if [ -e /opt/rh/gcc-toolset-14/enable ]; then
source /opt/rh/gcc-toolset-14/enable
fi
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm ..
make
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: kfdtest
testExecutable: BIN_DIR=$(Build.SourcesDirectory)/libhsakmt/tests/kfdtest/build ./run_kfdtest.sh
testParameters: '-p core --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: $(Build.SourcesDirectory)/libhsakmt/tests/kfdtest/scripts
os: ${{ job.os }}
- task: Bash@3
displayName: Build rocrtst
inputs:
targetType: 'inline'
workingDirectory: $(Build.SourcesDirectory)/rocrtst/suites/test_common
script: |
echo $(Build.SourcesDirectory)/rocrtst/thirdparty/lib | sudo tee -a /etc/ld.so.conf.d/rocm-ci.conf
sudo cat /etc/ld.so.conf.d/rocm-ci.conf
sudo ldconfig -v
ldconfig -p
if [ -e /opt/rh/gcc-toolset-14/enable ]; then
source /opt/rh/gcc-toolset-14/enable
fi
BASE_CLANG_DIR=$(Agent.BuildDirectory)/rocm/llvm/lib/clang
export NEWEST_CLANG_VER=$(ls -1 $BASE_CLANG_DIR | sort -V | tail -n 1)
mkdir build && cd build
cmake .. \
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm \
-DTARGET_DEVICES=${{ job.target }} \
-DROCM_DIR=$(Agent.BuildDirectory)/rocm \
-DLLVM_DIR=$(Agent.BuildDirectory)/rocm/llvm/bin \
-DOPENCL_INC_DIR=$BASE_CLANG_DIR/$NEWEST_CLANG_VER/include
make
make rocrtst_kernels
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocrtst
testExecutable: ./rocrtst64
testParameters: '--gtest_filter="-rocrtstNeg.Memory_Negative_Tests:rocrtstFunc.Memory_Max_Mem" --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: $(Build.SourcesDirectory)/rocrtst/suites/test_common/build/${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
# docker image will be missing libhwloc5
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ROCR_Runtime_test_${{ job.os }}_${{ job.target }}
dependsOn: ROCR_Runtime_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- task: Bash@3
displayName: Build kfdtest
inputs:
targetType: 'inline'
workingDirectory: $(Agent.BuildDirectory)/s/libhsakmt/tests/kfdtest
script: |
if [ -e /opt/rh/gcc-toolset-14/enable ]; then
source /opt/rh/gcc-toolset-14/enable
fi
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm ..
make
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: kfdtest
testExecutable: BIN_DIR=$(Agent.BuildDirectory)/s/libhsakmt/tests/kfdtest/build ./run_kfdtest.sh
testParameters: '-p core --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: $(Agent.BuildDirectory)/s/libhsakmt/tests/kfdtest/scripts
os: ${{ job.os }}
- task: Bash@3
displayName: Build rocrtst
inputs:
targetType: 'inline'
workingDirectory: $(Agent.BuildDirectory)/s/rocrtst/suites/test_common
script: |
echo $(Agent.BuildDirectory)/s/rocrtst/thirdparty/lib | sudo tee -a /etc/ld.so.conf.d/rocm-ci.conf
sudo cat /etc/ld.so.conf.d/rocm-ci.conf
sudo ldconfig -v
ldconfig -p
if [ -e /opt/rh/gcc-toolset-14/enable ]; then
source /opt/rh/gcc-toolset-14/enable
fi
BASE_CLANG_DIR=$(Agent.BuildDirectory)/rocm/llvm/lib/clang
export NEWEST_CLANG_VER=$(ls -1 $BASE_CLANG_DIR | sort -V | tail -n 1)
mkdir build && cd build
cmake .. \
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm \
-DTARGET_DEVICES=${{ job.target }} \
-DROCM_DIR=$(Agent.BuildDirectory)/rocm \
-DLLVM_DIR=$(Agent.BuildDirectory)/rocm/llvm/bin \
-DOPENCL_INC_DIR=$BASE_CLANG_DIR/$NEWEST_CLANG_VER/include
make
make rocrtst_kernels
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocrtst
testExecutable: ./rocrtst64
testParameters: '--gtest_filter="-rocrtstNeg.Memory_Negative_Tests:rocrtstFunc.Memory_Max_Mem" --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: $(Agent.BuildDirectory)/s//rocrtst/suites/test_common/build/${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
# docker image will be missing libhwloc5

View File

@@ -171,6 +171,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: DownloadPipelineArtifact@2
displayName: 'Download Pipeline Wheel Files'
retryCountOnTaskFailure: 3
inputs:
itemPattern: '**/*${{ job.os }}*.whl'
targetPath: $(Agent.BuildDirectory)

View File

@@ -0,0 +1,174 @@
parameters:
- name: componentName
type: string
default: aqlprofile
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
type: boolean
default: false
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- python3-pip
- name: rocmDependencies
type: object
default:
- clr
- llvm-project
- ROCR-Runtime
- name: rocmTestDependencies
type: object
default:
- clr
- llvm-project
- ROCR-Runtime
- rocprofiler-register
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.MEDIUM_BUILD_POOL }}
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
- gtest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
consolidateBuildAndInstall: true
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/aqlprofile/cmake_modules
-DAQLPROFILE_BUILD_TESTS=ON
-DGPU_TARGETS=${{ job.target }}
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- ${{ if eq(job.os, 'ubuntu2204') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: $(Agent.BuildDirectory)/rocm/share/hsa-amd-aqlprofile/
testExecutable: ./run_tests.sh
testParameters: ''
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: hip-tests
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -60,6 +79,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hip_tests_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -76,15 +99,18 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
# compile hip-tests
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: hip-tests
componentName: ${{ parameters.componentName }}
cmakeSourceDir: '../catch'
customBuildTarget: build_tests
extraBuildFlags: >-
@@ -96,9 +122,12 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -108,52 +137,56 @@ jobs:
extraEnvVars:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: hip_tests_test_${{ job.target }}
timeoutInMinutes: 240
dependsOn: hip_tests_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- task: Bash@3
displayName: Symlink rocm_agent_enumerator
inputs:
targetType: inline
script: |
# Assuming that /opt is no longer persistent across runs, test environments are fully ephemeral
sudo mkdir -p /opt/rocm/bin
sudo ln -s $(Agent.BuildDirectory)/rocm/bin/rocm_agent_enumerator /opt/rocm/bin/rocm_agent_enumerator
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: hip_tests
testDir: $(Agent.BuildDirectory)/rocm/share/hip
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
optSymLink: true
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: hip_tests_test_${{ job.target }}
timeoutInMinutes: 240
dependsOn: hip_tests_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Symlink rocm_agent_enumerator
inputs:
targetType: inline
script: |
# Assuming that /opt is no longer persistent across runs, test environments are fully ephemeral
sudo mkdir -p /opt/rocm/bin
sudo ln -s $(Agent.BuildDirectory)/rocm/bin/rocm_agent_enumerator /opt/rocm/bin/rocm_agent_enumerator
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: $(Agent.BuildDirectory)/rocm/share/hip
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
optSymLink: true

View File

@@ -35,9 +35,13 @@ parameters:
- ccache
- gfortran
- git
- libboost-filesystem-dev
- libboost-program-options-dev
- libdrm-dev
- liblapack-dev
- libmsgpack-dev
- libnuma-dev
- libopenblas-dev
- ninja-build
- python3-pip
- python3-venv
@@ -46,6 +50,12 @@ parameters:
default:
- joblib
- "packaging>=22.0"
- pyyaml
- msgpack
- simplejson
- ujson
- orjson
- yappi
- --upgrade
- name: rocmDependencies
type: object
@@ -67,6 +77,7 @@ parameters:
- clr
- hipBLAS-common
- llvm-project
- rocm-cmake
- rocminfo
- rocm_smi_lib
- rocprofiler-register
@@ -81,12 +92,12 @@ parameters:
- { pool: rocm-ci_medium_build_pool, os: ubuntu2204, packageManager: apt, target: gfx90a }
- { pool: rocm-ci_medium_build_pool, os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { pool: rocm-ci_medium_build_pool, os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { pool: rocm-ci_medium_build_pool, os: ubuntu2204, packageManager: apt, target: gfx1030 }
#- { pool: rocm-ci_medium_build_pool, os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { pool: rocm-ci_ultra_build_pool, os: almalinux8, packageManager: dnf, target: gfx942 }
- { pool: rocm-ci_medium_build_pool, os: almalinux8, packageManager: dnf, target: gfx90a }
- { pool: rocm-ci_medium_build_pool, os: almalinux8, packageManager: dnf, target: gfx1201 }
- { pool: rocm-ci_medium_build_pool, os: almalinux8, packageManager: dnf, target: gfx1100 }
- { pool: rocm-ci_medium_build_pool, os: almalinux8, packageManager: dnf, target: gfx1030 }
#- { pool: rocm-ci_medium_build_pool, os: almalinux8, packageManager: dnf, target: gfx1030 }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
@@ -134,7 +145,7 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -168,8 +179,8 @@ jobs:
mkdir -p $(Agent.BuildDirectory)/temp-deps
cd $(Agent.BuildDirectory)/temp-deps
# position-independent LAPACK is required for almalinux8 builds
cmake -DBUILD_GTEST=OFF -DBUILD_LAPACK=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON $(Agent.BuildDirectory)/s/deps
make
cmake -DBUILD_GTEST=OFF -DBUILD_LAPACK=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON $(Agent.BuildDirectory)/sparse/projects/hipblaslt/deps
make -j
sudo make install
- script: |
mkdir -p $(CCACHE_DIR)
@@ -187,6 +198,8 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeSourceDir: $(Agent.BuildDirectory)/sparse/projects/hipblaslt
cmakeBuildDir: $(Agent.BuildDirectory)/sparse/projects/hipblaslt/build
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_INCLUDE_PATH=$(Agent.BuildDirectory)/rocm/llvm/include
@@ -195,7 +208,11 @@ jobs:
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache
-DCMAKE_C_COMPILER_LAUNCHER=ccache
-DAMDGPU_TARGETS=${{ job.target }}
-DGPU_TARGETS=${{ job.target }}
-DBUILD_CLIENTS_TESTS=ON
-DHIPBLASLT_ENABLE_ROCROLLER=ON
-DHIPBLASLT_ENABLE_FETCH=ON
-DHIPBLASLT_ENABLE_BLIS=OFF
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:

View File

@@ -80,11 +80,11 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.target }} # todo: add OS
- ${{ build }}_ubuntu2204_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -141,12 +141,12 @@ jobs:
# gpuTarget: ${{ job.target }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.target }}
- job: ${{ parameters.componentName }}_test_ubuntu2204_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:

View File

@@ -72,15 +72,15 @@ parameters:
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
# - name: downstreamComponentMatrix
# type: object
# default:
# - rocFFT:
# name: rocFFT
# sparseCheckoutDir: projects/rocfft
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - hipRAND_build
- name: downstreamComponentMatrix
type: object
default:
- rocFFT:
name: rocFFT
sparseCheckoutDir: projects/rocfft
skipUnifiedBuild: 'false'
buildDependsOn:
- hipRAND_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
@@ -206,14 +206,14 @@ jobs:
environment: test
gpuTarget: ${{ job.target }}
# - ${{ if parameters.triggerDownstreamJobs }}:
# - ${{ each component in parameters.downstreamComponentMatrix }}:
# - ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
# - template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
# parameters:
# checkoutRepo: ${{ parameters.checkoutRepo }}
# sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
# buildDependsOn: ${{ component.buildDependsOn }}
# downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
# triggerDownstreamJobs: true
# unifiedBuild: ${{ parameters.unifiedBuild }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: hipSOLVER
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -66,7 +85,11 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hipSOLVER_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_ubuntu2204_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -81,18 +104,21 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
# build external gtest and lapack
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: external
cmakeBuildDir: '$(Build.SourcesDirectory)/deps/build'
cmakeSourceDir: '$(Build.SourcesDirectory)/deps'
cmakeBuildDir: '$(Agent.BuildDirectory)/s/deps/build'
cmakeSourceDir: '$(Agent.BuildDirectory)/s/deps'
installDir: '$(Pipeline.Workspace)/deps-install'
extraBuildFlags: >-
-DBUILD_BOOST=OFF
@@ -111,8 +137,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
gpuTarget: ${{ job.target }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -122,44 +150,49 @@ jobs:
# extraCopyDirectories:
# - deps-install
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: hipSOLVER_test_${{ job.target }}
dependsOn: hipSOLVER_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: hipSOLVER
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './hipsolver-test'
testParameters: '--gtest_filter="*checkin*" --gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_ubuntu2204_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './hipsolver-test'
testParameters: '--gtest_filter="*checkin*" --gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -69,7 +69,7 @@ parameters:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
#- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }

View File

@@ -40,10 +40,12 @@ parameters:
- gfortran
- libgfortran5
- libopenblas-dev
- liblapack-dev
- name: pipModules
type: object
default:
- joblib
- msgpack
- name: rocmDependencies
type: object
default:
@@ -52,6 +54,7 @@ parameters:
- hipSPARSE
- llvm-project
- rocBLAS
- rocm-cmake
- rocm_smi_lib
- rocminfo
- rocprofiler-register
@@ -65,6 +68,7 @@ parameters:
- llvm-project
- hipBLAS-common
- hipBLASLt
- rocm-cmake
- rocBLAS
- rocminfo
- rocprofiler-register
@@ -108,12 +112,13 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
# ignore sparse checkout for monorepo case, we want access to hipblaslt directory
# sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
@@ -123,14 +128,20 @@ jobs:
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
# NOTE: content between `---` is for transition support between old/new build systems
# and should be removed once transition is complete.
# -----------------------------
# Build and install gtest and lapack
# $(Pipeline.Workspace)/deps is a temporary folder for the build process
# $(Pipeline.Workspace)/s/deps is part of the hipSPARSELt repo
- script: mkdir $(Pipeline.Workspace)/deps
- script: mkdir -p $(Pipeline.Workspace)/deps
displayName: Create temp folder for external dependencies
# hipSPARSELt already has a CMake script for external deps, so we can just run that
# https://github.com/ROCm/hipSPARSELt/blob/develop/deps/CMakeLists.txt
- script: cmake $(Pipeline.Workspace)/s/deps
- ${{ if ne(parameters.sparseCheckoutDir, '') }}:
script: cmake $(Pipeline.Workspace)/s/projects/hipsparselt/deps
${{ else }}:
script: cmake $(Pipeline.Workspace)/s/deps
displayName: Configure hipSPARSELt external dependencies
workingDirectory: $(Pipeline.Workspace)/deps
- script: make
@@ -139,22 +150,39 @@ jobs:
- script: sudo make install
displayName: Install hipSPARSELt external dependencies
workingDirectory: $(Pipeline.Workspace)/deps
# -----------------------------
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
# NOTE: the following options are old build only
# and can be removed after full transition to new build
# -DAMDGPU_TARGETS=${{ job.target }}
# -DCMAKE_Fortran_COMPILER=f95
# -DTensile_LOGIC=
# -DTensile_CPU_THREADS=
# -DTensile_LIBRARY_FORMAT=msgpack
# -DROCM_PATH=$(Agent.BuildDirectory)/rocm
# -DBUILD_CLIENTS_TESTS=ON
# -DBUILD_USE_LOCAL_TENSILE=OFF
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_Fortran_COMPILER=f95
-DCMAKE_PREFIX_PATH="$(Agent.BuildDirectory)/rocm"
-DGPU_TARGETS=${{ job.target }}
-DAMDGPU_TARGETS=${{ job.target }}
-DCMAKE_Fortran_COMPILER=f95
-DTensile_LOGIC=
-DTensile_CPU_THREADS=
-DTensile_LIBRARY_FORMAT=msgpack
-DCMAKE_PREFIX_PATH="$(Agent.BuildDirectory)/rocm"
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_USE_LOCAL_TENSILE=OFF
-DHIPSPARSELT_ENABLE_FETCH=ON
-GNinja
${{ if ne(parameters.sparseCheckoutDir, '') }}:
cmakeSourceDir: $(Build.SourcesDirectory)/projects/hipsparselt
cmakeBuildDir: $(Build.SourcesDirectory)/projects/hipsparselt
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}

View File

@@ -77,6 +77,7 @@ jobs:
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/rocm/llvm
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DHIPTENSOR_BUILD_TESTS=ON

View File

@@ -71,7 +71,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:

View File

@@ -30,7 +30,7 @@ parameters:
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt }
- { os: ubuntu2404, packageManager: apt }
# - { os: ubuntu2404, packageManager: apt }
- { os: almalinux8, packageManager: dnf }
jobs:

View File

@@ -0,0 +1,251 @@
parameters:
- name: componentName
type: string
default: origami
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
type: boolean
default: false
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- wget
- python3
- python3-dev
- python3-pip
- libgtest-dev
- libboost-filesystem-dev
- libboost-program-options-dev
- name: pipModules
type: object
default:
- nanobind>=2.0.0
- name: rocmDependencies
type: object
default:
- clr
- llvm-project
- rocm-cmake
- rocminfo
- ROCR-Runtime
- rocprofiler-register
- name: rocmTestDependencies
type: object
default:
- clr
- llvm-project
- rocm-cmake
- rocminfo
- ROCR-Runtime
- rocprofiler-register
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt }
- { os: almalinux8, packageManager: dnf }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- name: downstreamComponentMatrix
type: object
default:
- hipBLASLt:
name: hipBLASLt
sparseCheckoutDir: projects/hipblaslt
skipUnifiedBuild: 'false'
buildDependsOn:
- origami_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: origami_build_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
pool:
vmImage: ${{ variables.BASE_BUILD_POOL }}
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
- gtest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DORIGAMI_BUILD_SHARED_LIBS=ON
-DORIGAMI_ENABLE_PYTHON=ON
-DORIGAMI_BUILD_TESTING=ON
-GNinja
- ${{ if ne(job.os, 'almalinux8') }}:
- task: PublishPipelineArtifact@1
displayName: 'Publish Build Directory Artifact'
inputs:
targetPath: '$(Agent.BuildDirectory)/s/build'
artifact: '${{ parameters.componentName }}_${{ job.os }}_build_dir'
publishLocation: 'pipeline'
- task: PublishPipelineArtifact@1
displayName: 'Publish Python Source Artifact'
inputs:
targetPath: '$(Agent.BuildDirectory)/s/python'
artifact: '${{ parameters.componentName }}_${{ job.os }}_python_src'
publishLocation: 'pipeline'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}
componentName: ${{ parameters.componentName }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: origami_test_${{ job.os }}_${{ job.target }}
timeoutInMinutes: 120
dependsOn: origami_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
os: ${{ job.os }}
- task: DownloadPipelineArtifact@2
displayName: 'Download Build Directory Artifact'
inputs:
artifact: '${{ parameters.componentName }}_${{ job.os }}_build_dir'
path: '$(Agent.BuildDirectory)/s/build'
- task: DownloadPipelineArtifact@2
displayName: 'Download Python Source Artifact'
inputs:
artifact: '${{ parameters.componentName }}_${{ job.os }}_python_src'
path: '$(Agent.BuildDirectory)/s/python'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
os: ${{ job.os }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './origami-tests'
testParameters: '--yaml origami-tests.yaml --gtest_output=xml:./test_output.xml --gtest_color=yes'
- script: |
set -e
export PYTHONPATH=$(Agent.BuildDirectory)/s/build/python:$PYTHONPATH
echo "--- Running origami_test.py ---"
python3 $(Agent.BuildDirectory)/s/python/origami_test.py
echo "--- Running origami_grid_test.py ---"
python3 $(Agent.BuildDirectory)/s/python/origami_grid_test.py
displayName: 'Run Python Binding Tests'
condition: succeeded()
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -76,14 +76,14 @@ jobs:
- template: /.azuredevops/variables-global.yml
- name: HIP_ROCCLR_HOME
value: $(Build.BinariesDirectory)/rocm
pool: ${{ variables.HIGH_BUILD_POOL }}
pool: ${{ variables.MEDIUM_BUILD_POOL }}
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rdc
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -33,6 +52,7 @@ parameters:
- clr
- hipBLAS-common
- hipBLASLt
- hipRAND
- llvm-project
- rocBLAS
- rocm-cmake
@@ -43,6 +63,7 @@ parameters:
- rocprofiler
- rocprofiler-register
- rocprofiler-sdk
- rocRAND
- ROCR-Runtime
- name: rocmTestDependencies
type: object
@@ -74,7 +95,11 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rdc_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -85,16 +110,22 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
parameters:
cmakeVersion: '3.25.0'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
# Build grpc
- task: Bash@3
displayName: 'git clone grpc'
@@ -104,6 +135,7 @@ jobs:
workingDirectory: $(Build.SourcesDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
cmakeBuildDir: $(Build.SourcesDirectory)/grpc/build
cmakeSourceDir: $(Build.SourcesDirectory)/grpc
installDir: $(Build.SourcesDirectory)/bin
@@ -117,6 +149,7 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DGRPC_ROOT="$(Build.SourcesDirectory)/bin"
@@ -126,9 +159,12 @@ jobs:
-DAMDGPU_TARGETS=${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -136,60 +172,64 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
gpuTarget: ${{ job.target }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rdc_test_${{ job.target }}
dependsOn: rdc_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
- name: ROCM_DIR
value: $(Agent.BuildDirectory)/rocm
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- task: Bash@3
displayName: Setup test environment
inputs:
targetType: inline
script: |
sudo ln -s $(Agent.BuildDirectory)/rocm/bin/rdcd /usr/sbin/rdcd
echo $(Agent.BuildDirectory)/rocm/lib/rdc/grpc/lib | sudo tee /etc/ld.so.conf.d/grpc.conf
sudo ldconfig -v
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- task: Bash@3
displayName: Test rdc
inputs:
targetType: inline
script: >-
$(Agent.BuildDirectory)/rocm/share/rdc/rdctst_tests/rdctst
--batch_mode
--start_rdcd
--unauth_comm
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
extraPaths: /home/user/workspace/rocm/bin
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
- name: ROCM_DIR
value: $(Agent.BuildDirectory)/rocm
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Setup test environment
inputs:
targetType: inline
script: |
sudo ln -s $(Agent.BuildDirectory)/rocm/bin/rdcd /usr/sbin/rdcd
echo $(Agent.BuildDirectory)/rocm/lib/rdc/grpc/lib | sudo tee /etc/ld.so.conf.d/grpc.conf
sudo ldconfig -v
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- task: Bash@3
displayName: Test rdc
inputs:
targetType: inline
script: >-
$(Agent.BuildDirectory)/rocm/share/rdc/rdctst_tests/rdctst
--batch_mode
--start_rdcd
--unauth_comm
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
extraPaths: /home/user/workspace/rocm/bin

View File

@@ -70,6 +70,7 @@ parameters:
- hipBLAS-common
- hipBLASLt
- llvm-project
- rocm-cmake
- rocminfo
- rocprofiler-register
- rocm_smi_lib
@@ -84,12 +85,12 @@ parameters:
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
#- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { os: almalinux8, packageManager: dnf, target: gfx942 }
- { os: almalinux8, packageManager: dnf, target: gfx90a }
- { os: almalinux8, packageManager: dnf, target: gfx1201 }
- { os: almalinux8, packageManager: dnf, target: gfx1100 }
- { os: almalinux8, packageManager: dnf, target: gfx1030 }
#- { os: almalinux8, packageManager: dnf, target: gfx1030 }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
@@ -104,17 +105,24 @@ parameters:
- rocBLAS_build
# rocSOLVER depends on both rocBLAS and rocPRIM
# for a unified build, rocBLAS will be the one to call rocSOLVER
- rocSOLVER:
name: rocSOLVER
sparseCheckoutDir: projects/rocsolver
# - rocSOLVER:
# name: rocSOLVER
# sparseCheckoutDir: projects/rocsolver
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocBLAS_build
# unifiedBuild:
# downstreamAggregateNames: rocBLAS+rocPRIM
# buildDependsOn:
# - rocBLAS_build
# - rocPRIM_build
# temporary rocblas->hipblas downstream path while the SOLVERs are disabled
- hipBLAS:
name: hipBLAS
sparseCheckoutDir: projects/hipblas
skipUnifiedBuild: 'false'
buildDependsOn:
- rocBLAS_build
unifiedBuild:
downstreamAggregateNames: rocBLAS+rocPRIM
buildDependsOn:
- rocBLAS_build
- rocPRIM_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
@@ -147,7 +155,7 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -172,6 +180,8 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeSourceDir: $(Agent.BuildDirectory)/sparse/projects/rocblas
cmakeBuildDir: $(Agent.BuildDirectory)/sparse/projects/rocblas/build
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm/llvm;$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_BUILD_TYPE=Release

View File

@@ -8,6 +8,25 @@ parameters:
- name: checkoutRef
type: string
default: ''
- name: rocPyDecodeRepo
type: string
default: rocpydecode_repo
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -56,10 +75,23 @@ parameters:
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- name: downstreamComponentMatrix
type: object
default:
- rocPyDecode:
name: rocPyDecode
sparseCheckoutDir: ''
skipUnifiedBuild: 'false'
buildDependsOn:
- rocDecode_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -83,12 +115,15 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
@@ -169,3 +204,15 @@ jobs:
registerROCmPackages: true
environment: test
gpuTarget: ${{ job.target }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.rocPyDecodeRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -78,19 +78,19 @@ parameters:
target: gfx942
- gfx90a:
target: gfx90a
# - name: downstreamComponentMatrix
# type: object
# default:
# - hipFFT:
# name: hipFFT
# sparseCheckoutDir: projects/hipfft
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocFFT_build
- name: downstreamComponentMatrix
type: object
default:
- hipFFT:
name: hipFFT
sparseCheckoutDir: projects/hipfft
skipUnifiedBuild: 'false'
buildDependsOn:
- rocFFT_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
@@ -151,12 +151,12 @@ jobs:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.target }}
- job: ${{ parameters.componentName }}_test_ubuntu2204_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
@@ -196,14 +196,14 @@ jobs:
environment: test
gpuTarget: ${{ job.target }}
# - ${{ if parameters.triggerDownstreamJobs }}:
# - ${{ each component in parameters.downstreamComponentMatrix }}:
# - ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
# - template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
# parameters:
# checkoutRepo: ${{ parameters.checkoutRepo }}
# sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
# buildDependsOn: ${{ component.buildDependsOn }}
# downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
# triggerDownstreamJobs: true
# unifiedBuild: ${{ parameters.unifiedBuild }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -91,12 +91,12 @@ parameters:
- rocPRIM_build
# rocSOLVER depends on both rocBLAS and rocPRIM
# for a unified build, rocBLAS will be the one to call rocSOLVER
- rocSOLVER:
name: rocSOLVER
sparseCheckoutDir: projects/rocsolver
skipUnifiedBuild: 'true'
buildDependsOn:
- rocPRIM_build
# - rocSOLVER:
# name: rocSOLVER
# sparseCheckoutDir: projects/rocsolver
# skipUnifiedBuild: 'true'
# buildDependsOn:
# - rocPRIM_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
@@ -210,7 +210,7 @@ jobs:
parameters:
componentName: ${{ parameters.componentName }}
testDir: '$(Agent.BuildDirectory)/rocm/bin/rocprim'
extraTestParameters: '-I ${{ job.shard }},,${{ job.shardCount }} -E device_merge_inplace'
extraTestParameters: '-I ${{ job.shard }},,${{ job.shardCount }}'
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:

View File

@@ -5,6 +5,22 @@ parameters:
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -47,19 +63,19 @@ parameters:
type: object
default:
buildJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
testJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocPyDecode_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -74,16 +90,20 @@ jobs:
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: 'Save Python Package Paths'
inputs:
@@ -190,6 +210,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: DownloadPipelineArtifact@2
displayName: 'Download Pipeline Wheel Files'
retryCountOnTaskFailure: 3
inputs:
itemPattern: '**/*.whl'
targetPath: $(Agent.BuildDirectory)

View File

@@ -74,12 +74,12 @@ parameters:
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
#- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { os: almalinux8, packageManager: dnf, target: gfx942 }
- { os: almalinux8, packageManager: dnf, target: gfx90a }
- { os: almalinux8, packageManager: dnf, target: gfx1201 }
- { os: almalinux8, packageManager: dnf, target: gfx1100 }
- { os: almalinux8, packageManager: dnf, target: gfx1030 }
#- { os: almalinux8, packageManager: dnf, target: gfx1030 }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
@@ -94,17 +94,17 @@ parameters:
- rocSOLVER_build
# hipSOLVER depends on both rocSOLVER and rocSPARSE
# for a unified build, rocSOLVER will be the one to call hipSOLVER
# - hipSOLVER:
# name: hipSOLVER
# sparseCheckoutDir: projects/hipsolver
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocSOLVER_build
# unifiedBuild:
# downstreamAggregateNames: rocSOLVER+rocSPARSE
# buildDependsOn:
# - rocSOLVER_build
# - rocSPARSE_build
# - hipSOLVER:
# name: hipSOLVER
# sparseCheckoutDir: projects/hipsolver
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocSOLVER_build
# unifiedBuild:
# downstreamAggregateNames: rocSOLVER+rocSPARSE
# buildDependsOn:
# - rocSOLVER_build
# - rocSPARSE_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:

View File

@@ -73,7 +73,7 @@ parameters:
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
#- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }

View File

@@ -70,7 +70,7 @@ jobs:
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ variables.HIGH_BUILD_POOL }}
pool: ${{ variables.MEDIUM_BUILD_POOL }}
workspace:
clean: all
steps:

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocm-core
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -27,6 +46,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocm_core_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
${{ if eq(job.os, 'ubuntu2404') }}:
vmImage: 'ubuntu-24.04'
@@ -50,8 +73,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
@@ -65,9 +90,12 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml

View File

@@ -14,9 +14,11 @@ parameters:
type: object
default:
- cmake
- libdw-dev
- libglfw3-dev
- libmsgpack-dev
- libtbb-dev
- libva-amdgpu-dev
- ninja-build
- python3-pip
- name: rocmDependencies
@@ -33,16 +35,21 @@ parameters:
- hipRAND
- hipSOLVER
- hipSPARSE
- hipTensor
- llvm-project
- MIOpen
- rocBLAS
- rocFFT
- rocJPEG
- rocPRIM
- rocprofiler-register
- rocprofiler-sdk
- ROCR-Runtime
- rocRAND
- rocSOLVER
- rocSPARSE
- rocThrust
- rocWMMA
- name: rocmTestDependencies
type: object
default:
@@ -57,18 +64,22 @@ parameters:
- hipRAND
- hipSOLVER
- hipSPARSE
- hipTensor
- llvm-project
- rocBLAS
- rocFFT
- rocminfo
- rocPRIM
- rocJPEG
- rocprofiler-register
- rocprofiler-sdk
- ROCR-Runtime
- rocRAND
- rocSOLVER
- rocSPARSE
- rocThrust
- roctracer
- rocWMMA
- name: jobMatrix
type: object
@@ -97,6 +108,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
parameters:
cmakeVersion: '3.25.0'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -158,6 +173,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
parameters:
cmakeVersion: '3.25.0'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:

View File

@@ -36,8 +36,10 @@ parameters:
- gfortran
- git
- libdrm-dev
- liblapack-dev
- libmsgpack-dev
- libnuma-dev
- libopenblas-dev
- ninja-build
- python3-pip
- python3-venv
@@ -46,6 +48,8 @@ parameters:
default:
- joblib
- "packaging>=22.0"
- pytest
- pytest-cmake
- --upgrade
- name: rocmDependencies
type: object
@@ -98,12 +102,12 @@ jobs:
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
@@ -134,12 +138,26 @@ jobs:
rocm-libraries | ${{ job.os }} | ${{ job.target }} | $(DAY_STRING)
rocm-libraries | ${{ job.os }} | ${{ job.target }}
rocm-libraries | ${{ job.os }}
- task: Bash@3
displayName: Add paths for CMake and Python site-packages binaries
inputs:
targetType: inline
script: |
USER_BASE=$(python3 -m site --user-base)
echo "##vso[task.prependpath]$USER_BASE/bin"
echo "##vso[task.setvariable variable=PytestCmakePath]$USER_BASE/share/Pytest/cmake"
displayName: Set cmake configure paths
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
extraBuildFlags: >-
-DROCM_LIBRARIES_SUPERBUILD=ON
-GNinja
-D CMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor;$(PytestCmakePath)
-D CMAKE_INCLUDE_PATH=$(Agent.BuildDirectory)/rocm/llvm/include
-D CMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-D CMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-D CMAKE_CXX_COMPILER_LAUNCHER=ccache
-D CMAKE_C_COMPILER_LAUNCHER=ccache
-G Ninja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocm_smi_lib
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -32,6 +51,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocm_smi_lib_build_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
${{ if eq(job.os, 'ubuntu2404') }}:
vmImage: 'ubuntu-24.04'
@@ -55,8 +78,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
@@ -65,51 +90,56 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
# parameters:
# aptPackages: ${{ parameters.aptPackages }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocm_smi_lib_test_${{ job.os }}_${{ job.target }}
dependsOn: rocm_smi_lib_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocm_smi_lib
testDir: '$(Agent.BuildDirectory)'
testExecutable: 'sudo ./rocm/share/rocm_smi/rsmitst_tests/rsmitst'
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocm_smi_lib_test_${{ job.os }}_${{ job.target }}
dependsOn: rocm_smi_lib_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: '$(Agent.BuildDirectory)'
testExecutable: 'sudo ./rocm/share/rocm_smi/rsmitst_tests/rsmitst'
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocminfo
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -40,7 +59,11 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocminfo_build_${{ job.os }}
- job: ${{ parameters.componentName }}_build_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
@@ -62,14 +85,18 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
@@ -78,65 +105,71 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocminfo_test_${{ job.target }}
dependsOn: rocminfo_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocminfo
testDir: '$(Agent.BuildDirectory)'
testExecutable: './rocm/bin/rocminfo'
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocm_agent_enumerator
testDir: '$(Agent.BuildDirectory)'
testExecutable: './rocm/bin/rocm_agent_enumerator'
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
registerROCmPackages: true
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocminfo_test_${{ job.target }}
dependsOn: rocminfo_build_${{ job.os }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
parameters:
runRocminfo: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: '$(Agent.BuildDirectory)'
testExecutable: './rocm/bin/rocminfo'
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocm_agent_enumerator
testDir: '$(Agent.BuildDirectory)'
testExecutable: './rocm/bin/rocm_agent_enumerator'
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
registerROCmPackages: true
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocprofiler-compute
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -36,6 +55,7 @@ parameters:
- pymongo
- pyyaml
- setuptools
- sqlalchemy
- tabulate
- textual
- textual_plotext
@@ -65,43 +85,23 @@ parameters:
type: object
default:
buildJobs:
- gfx942-staging:
name: gfx942_staging
- gfx942:
target: gfx942
dependencySource: staging
- gfx942-mainline:
name: gfx942_mainline
target: gfx942
dependencySource: mainline
- gfx90a-staging:
name: gfx90a_staging
- gfx90a:
target: gfx90a
dependencySource: staging
- gfx90a-mainline:
name: gfx90a_mainline
target: gfx90a
dependencySource: mainline
testJobs:
- gfx942-staging:
name: gfx942_staging
- gfx942:
target: gfx942
dependencySource: staging
- gfx942-mainline:
name: gfx942_mainline
target: gfx942
dependencySource: mainline
- gfx90a-staging:
name: gfx90a_staging
- gfx90a:
target: gfx90a
dependencySource: staging
- gfx90a-mainline:
name: gfx90a_mainline
target: gfx90a
dependencySource: mainline
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocprofiler_compute_build_${{ job.name }}
- job: rocprofiler_compute_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -118,17 +118,19 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
artifactName: ${{ job.dependencySource }}
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: ${{ job.dependencySource }}
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -137,80 +139,83 @@ jobs:
# pipModules: ${{ parameters.pipModules }}
# gpuTarget: ${{ job.target }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_compute_test_${{ job.name }}
timeoutInMinutes: 120
dependsOn: rocprofiler_compute_build_${{ job.name }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: PYTHON_VERSION
value: 3.10
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
postTargetFilter: ${{ job.dependencySource }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
dependencySource: ${{ job.dependencySource }}
gpuTarget: ${{ job.target }}
- task: Bash@3
displayName: Add en_US.UTF-8 locale
inputs:
targetType: inline
script: |
sudo locale-gen en_US.UTF-8
sudo update-locale
locale -a
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/llvm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DCMAKE_HIP_ARCHITECTURES=${{ job.target }}
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DENABLE_TESTS=ON
-DINSTALL_TESTS=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofiler-compute
testDir: $(Build.BinariesDirectory)/libexec/rocprofiler-compute
testExecutable: ROCM_PATH=$(Agent.BuildDirectory)/rocm ctest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_compute_test_${{ job.target }}
timeoutInMinutes: 120
dependsOn: rocprofiler_compute_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: PYTHON_VERSION
value: 3.10
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Add en_US.UTF-8 locale
inputs:
targetType: inline
script: |
sudo locale-gen en_US.UTF-8
sudo update-locale
locale -a
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/llvm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DCMAKE_HIP_ARCHITECTURES=${{ job.target }}
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DENABLE_TESTS=ON
-DINSTALL_TESTS=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: $(Build.BinariesDirectory)/libexec/rocprofiler-compute
testExecutable: ROCM_PATH=$(Agent.BuildDirectory)/rocm ctest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocprofiler-register
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -27,6 +46,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocprofiler_register_${{ job.os }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
pool:
${{ if eq(job.os, 'ubuntu2404') }}:
vmImage: 'ubuntu-24.04'
@@ -50,9 +73,10 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: rocprofiler-register
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
useAmdclang: false
extraBuildFlags: >-
@@ -62,12 +86,16 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofiler-register
componentName: ${{ parameters.componentName }}
testDir: $(Agent.BuildDirectory)/s/build
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocprofiler-sdk
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -73,6 +92,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocprofiler_sdk_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -89,6 +112,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
@@ -96,6 +120,8 @@ jobs:
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Add Python site-packages binaries to path
inputs:
@@ -105,6 +131,7 @@ jobs:
echo "##vso[task.prependpath]$USER_BASE/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCPROFILER_BUILD_TESTS=ON
@@ -114,9 +141,12 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -126,62 +156,68 @@ jobs:
# gpuTarget: ${{ job.target }}
# registerROCmPackages: true
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_sdk_test_${{ job.target }}
dependsOn: rocprofiler_sdk_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
- task: Bash@3
displayName: Add Python and ROCm binaries to path
inputs:
targetType: inline
script: |
USER_BASE=$(python3 -m site --user-base)
echo "##vso[task.prependpath]$USER_BASE/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCPROFILER_BUILD_TESTS=ON
-DROCPROFILER_BUILD_SAMPLES=ON
-DROCPROFILER_BUILD_RELEASE=ON
-DGPU_TARGETS=${{ job.target }}
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH}}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofiler-sdk
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
registerROCmPackages: true
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_sdk_test_${{ job.target }}
dependsOn: rocprofiler_sdk_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Add Python and ROCm binaries to path
inputs:
targetType: inline
script: |
USER_BASE=$(python3 -m site --user-base)
echo "##vso[task.prependpath]$USER_BASE/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
componentName: ${{ parameters.componentName }}
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DROCPROFILER_BUILD_TESTS=ON
-DROCPROFILER_BUILD_SAMPLES=ON
-DROCPROFILER_BUILD_RELEASE=ON
-DGPU_TARGETS=${{ job.target }}
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH}}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: $(Agent.BuildDirectory)/s/build
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
registerROCmPackages: true

View File

@@ -6,6 +6,25 @@ parameters:
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: componentName
type: string
default: rocprofiler-systems
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -87,6 +106,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocprofiler_systems_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -105,6 +128,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
@@ -136,12 +160,16 @@ jobs:
-DCMAKE_CXX_FLAGS=-I$(Agent.BuildDirectory)/rocm/include/rocjpeg
-DGPU_TARGETS=${{ job.target }}
-GNinja
componentName: ${{ parameters.componentName }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
gpuTarget: ${{ job.target }}
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
gpuTarget: ${{ job.target }}
componentName: ${{ parameters.componentName }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
@@ -151,85 +179,93 @@ jobs:
registerROCmPackages: true
extraPaths: /home/user/workspace/rocm/bin:/home/user/workspace/rocm/llvm/bin
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_systems_test_${{ job.target }}
dependsOn: rocprofiler_systems_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
timeoutInMinutes: 180
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
pool:
name: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/llvm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
# build flags reference: https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html
extraBuildFlags: >-
-DROCPROFSYS_BUILD_TESTING=ON
-DROCPROFSYS_BUILD_DYNINST=ON
-DROCPROFSYS_BUILD_LIBUNWIND=ON
-DROCPROFSYS_DISABLE_EXAMPLES="openmp-target"
-DDYNINST_BUILD_TBB=ON
-DDYNINST_BUILD_ELFUTILS=ON
-DDYNINST_BUILD_LIBIBERTY=ON
-DDYNINST_BUILD_BOOST=ON
-DROCPROFSYS_USE_PAPI=ON
-DROCPROFSYS_USE_MPI=ON
-DCMAKE_CXX_FLAGS=-I$(Agent.BuildDirectory)/rocm/include/rocjpeg
-DGPU_TARGETS=${{ job.target }}
-GNinja
- task: Bash@3
displayName: Set up rocprofiler-systems env
inputs:
targetType: inline
script: source share/rocprofiler-systems/setup-env.sh
workingDirectory: build
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofiler-systems
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
registerROCmPackages: true
gpuTarget: ${{ job.target }}
extraPaths: /home/user/workspace/rocm/bin:/home/user/workspace/rocm/llvm/bin
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocprofiler_systems_test_${{ job.target }}
dependsOn: rocprofiler_systems_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
timeoutInMinutes: 180
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
pool:
name: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/llvm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
cmakeSourceDir: $(Agent.BuildDirectory)/s/projects/rocprofiler-systems
# build flags reference: https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html
extraBuildFlags: >-
-DCMAKE_INSTALL_PREFIX=$(Agent.BuildDirectory)/rocprofiler-systems
-DROCPROFSYS_USE_PYTHON=ON
-DROCPROFSYS_BUILD_TESTING=ON
-DROCPROFSYS_BUILD_DYNINST=ON
-DROCPROFSYS_BUILD_LIBUNWIND=ON
-DROCPROFSYS_DISABLE_EXAMPLES="openmp-target"
-DDYNINST_BUILD_TBB=ON
-DDYNINST_BUILD_ELFUTILS=ON
-DDYNINST_BUILD_LIBIBERTY=ON
-DDYNINST_BUILD_BOOST=ON
-DROCPROFSYS_USE_PAPI=ON
-DROCPROFSYS_USE_MPI=ON
-DCMAKE_CXX_FLAGS=-I$(Agent.BuildDirectory)/rocm/include/rocjpeg
-DGPU_TARGETS=${{ job.target }}
-GNinja
- task: Bash@3
displayName: Set up rocprofiler-systems env
inputs:
targetType: inline
script: source $(Agent.BuildDirectory)/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh
workingDirectory: $(Agent.BuildDirectory)/rocprofiler-systems/share/rocprofiler-systems
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testDir: $(Agent.BuildDirectory)/s/build/tests/
testParameters: '--output-on-failure'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
registerROCmPackages: true
gpuTarget: ${{ job.target }}
extraPaths: /home/user/workspace/rocm/bin:/home/user/workspace/rocm/llvm/bin

View File

@@ -8,6 +8,22 @@ parameters:
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -70,6 +86,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -94,6 +114,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
@@ -108,6 +129,8 @@ jobs:
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
@@ -115,6 +138,7 @@ jobs:
extraBuildFlags: >-
-DCMAKE_MODULE_PATH=$(Build.SourcesDirectory)/cmake_modules;$(Agent.BuildDirectory)/rocm/lib/cmake;$(Agent.BuildDirectory)/rocm/lib/cmake/hip;$(Agent.BuildDirectory)/rocm/lib64/cmake;$(Agent.BuildDirectory)/rocm/lib64/cmake/hip
-DCMAKE_PREFIX_PATH="$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor"
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_POSITION_INDEPENDENT_CODE=ON
-DENABLE_LDCONFIG=OFF
-DUSE_PROF_API=1
@@ -122,10 +146,13 @@ jobs:
multithreadFlag: -- -j32
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
@@ -139,63 +166,68 @@ jobs:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- ROCM_PATH:::/home/user/workspace/rocm
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
- name: LD_LIBRARY_PATH
value: $(Agent.BuildDirectory)/rocm/lib/rocprofiler:$(Agent.BuildDirectory)/rocm/share/rocprofiler/tests-v1/test:$(Agent.BuildDirectory)/rocm/share/rocprofiler/tests
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofilerV1
testDir: $(Agent.BuildDirectory)/rocm/share/rocprofiler/tests-v1
testExecutable: ./run.sh
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofilerV2
testDir: $(Agent.BuildDirectory)/rocm
testExecutable: share/rocprofiler/tests/runUnitTests
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
- name: ROCM_PATH
value: $(Agent.BuildDirectory)/rocm
- name: LD_LIBRARY_PATH
value: $(Agent.BuildDirectory)/rocm/lib/rocprofiler:$(Agent.BuildDirectory)/rocm/share/rocprofiler/tests-v1/test:$(Agent.BuildDirectory)/rocm/share/rocprofiler/tests
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
parameters:
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofilerV1
testDir: $(Agent.BuildDirectory)/rocm/share/rocprofiler/tests-v1
testExecutable: ./run.sh
testParameters: ''
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocprofilerV2
testDir: $(Agent.BuildDirectory)/rocm
testExecutable: share/rocprofiler/tests/runUnitTests
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}

View File

@@ -8,6 +8,22 @@ parameters:
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -65,6 +81,10 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -87,6 +107,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
@@ -94,6 +115,8 @@ jobs:
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
# the linker flags will not affect ubuntu2204 builds as the paths do not exist
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
@@ -109,10 +132,13 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
os: ${{ job.os }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
@@ -123,53 +149,57 @@ jobs:
# gpuTarget: ${{ job.target }}
# registerROCmPackages: true
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: roctracer
testExecutable: $(Agent.BuildDirectory)/rocm/share/roctracer/run_tests.sh
testParameters: ''
testDir: $(Agent.BuildDirectory)
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
registerROCmPackages: true
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
registerROCmPackages: true
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testExecutable: $(Agent.BuildDirectory)/rocm/share/roctracer/run_tests.sh
testParameters: ''
testDir: $(Agent.BuildDirectory)
testPublishResults: false
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
registerROCmPackages: true

View File

@@ -40,7 +40,6 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:

View File

@@ -0,0 +1,63 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: catch2Version
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: catch2_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: Bash@3
displayName: Clone catch2 ${{ parameters.catch2Version }}
inputs:
targetType: inline
script: git clone https://github.com/catchorg/Catch2.git -b ${{ parameters.catch2Version }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/Catch2/build
cmakeSourceDir: $(Agent.BuildDirectory)/Catch2
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -0,0 +1,67 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: fmtlibVersion
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- libfmt-dev
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: fmtlib_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: Bash@3
displayName: Clone fmtlib ${{ parameters.fmtlibVersion }}
inputs:
targetType: inline
script: git clone https://github.com/fmtlib/fmt.git -b ${{ parameters.fmtlibVersion }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/fmt/build
cmakeSourceDir: $(Agent.BuildDirectory)/fmt
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DFMT_SYSTEM_HEADERS=ON
-DFMT_INSTALL=ON
-DFMT_TEST=OFF
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -0,0 +1,64 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: libdivideVersion
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: libdivide_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: Bash@3
displayName: Clone libdivide ${{ parameters.libdivideVersion }}
inputs:
targetType: inline
script: git clone https://github.com/ridiculousfish/libdivide.git -b ${{ parameters.libdivideVersion }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/libdivide/build
cmakeSourceDir: $(Agent.BuildDirectory)/libdivide
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DLIBDIVIDE_BUILD_TESTS=OFF
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -0,0 +1,71 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: spdlogVersion
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: spdlog_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
- fmtlib
- task: Bash@3
displayName: Clone spdlog ${{ parameters.spdlogVersion }}
inputs:
targetType: inline
script: git clone https://github.com/gabime/spdlog.git -b ${{ parameters.spdlogVersion }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/spdlog/build
cmakeSourceDir: $(Agent.BuildDirectory)/spdlog
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/vendor
-DCMAKE_BUILD_TYPE=Release
-DSPDLOG_USE_STD_FORMAT=OFF
-DSPDLOG_FMT_EXTERNAL_HO=ON
-DSPDLOG_INSTALL=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -219,7 +219,6 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
gpuTarget: $(JOB_GPU_TARGET)
setupHIPLibrarySymlinks: true
- task: Bash@3
@@ -398,6 +397,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: DownloadPipelineArtifact@2
displayName: 'Download Pipeline Wheel Files'
retryCountOnTaskFailure: 3
inputs:
itemPattern: '**/*.whl'
targetPath: $(Agent.BuildDirectory)
@@ -406,7 +406,6 @@ jobs:
parameters:
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: $(JOB_GPU_TARGET)
dependencySource: staging
# get sources to run test scripts
- task: Bash@3
displayName: git clone upstream pytorch

View File

@@ -3,21 +3,21 @@ parameters:
- name: jobList
type: object
default:
- { os: ubuntu2204, packageManager: apt, target: gfx942, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx90a, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1201, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1100, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1030, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx942, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx90a, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1201, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1100, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1030, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx942, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx90a, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1201, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1100, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1030, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { os: ubuntu2404, packageManager: apt, target: gfx942 }
- { os: ubuntu2404, packageManager: apt, target: gfx90a }
- { os: ubuntu2404, packageManager: apt, target: gfx1201 }
- { os: ubuntu2404, packageManager: apt, target: gfx1100 }
- { os: ubuntu2404, packageManager: apt, target: gfx1030 }
- { os: almalinux8, packageManager: dnf, target: gfx942 }
- { os: almalinux8, packageManager: dnf, target: gfx90a }
- { os: almalinux8, packageManager: dnf, target: gfx1201 }
- { os: almalinux8, packageManager: dnf, target: gfx1100 }
- { os: almalinux8, packageManager: dnf, target: gfx1030 }
- name: rocmDependencies
type: object
default:
@@ -92,8 +92,8 @@ schedules:
jobs:
- ${{ each job in parameters.jobList }}:
- job: nightly_${{ job.os }}_${{ job.target }}_${{ job.source }}
timeoutInMinutes: 90
- job: nightly_${{ job.os }}_${{ job.target }}
timeoutInMinutes: 120
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -116,7 +116,6 @@ jobs:
displayName: System disk space before ROCm
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencySource: ${{ job.source }}
dependencyList: ${{ parameters.rocmDependencies }}
os: ${{ job.os }}
gpuTarget: ${{ job.target }}
@@ -172,11 +171,11 @@ jobs:
&& dpkg-deb -R $PACKAGE_NAME hsa-amd-aqlprofile \
&& cp -R hsa-amd-aqlprofile/opt/rocm-*/* rocm
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}${{ job.source }}&api-version=7.1" \
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}&api-version=7.1" \
&& DOWNLOAD_URL=$(curl -s $ARTIFACT_URL | jq ".resource.downloadUrl" | tr -d '"') \
&& wget -nv --retry-connrefused $DOWNLOAD_URL -O nightly.zip \
&& unzip nightly.zip \
&& tar -xf nightly${{ job.os }}${{ job.target }}${{ job.source }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
&& tar -xf nightly${{ job.os }}${{ job.target }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
RUN echo /root/rocm/lib | tee /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib | tee -a /etc/ld.so.conf.d/rocm-ci.conf
@@ -210,11 +209,11 @@ jobs:
&& rpm2cpio $PACKAGE_NAME | (cd hsa-amd-aqlprofile && cpio -idmv) \
&& cp -R hsa-amd-aqlprofile/opt/rocm-*/* rocm
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}${{ job.source }}&api-version=7.1" \
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}&api-version=7.1" \
&& DOWNLOAD_URL=$(curl -s $ARTIFACT_URL | jq ".resource.downloadUrl" | tr -d '"') \
&& wget -nv --retry-connrefused $DOWNLOAD_URL -O nightly.zip \
&& UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE unzip nightly.zip \
&& tar -xf nightly${{ job.os }}${{ job.target }}${{ job.source }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
&& tar -xf nightly${{ job.os }}${{ job.target }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
RUN echo /root/rocm/lib | tee /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib | tee -a /etc/ld.so.conf.d/rocm-ci.conf
@@ -227,13 +226,14 @@ jobs:
cat Dockerfile
- task: Docker@2
displayName: Build and upload Docker image
retryCountOnTaskFailure: 3
inputs:
containerRegistry: ContainerService3
repository: 'nightly-${{ job.os }}-${{ job.target }}-${{ job.source }}'
repository: 'nightly-${{ job.os }}-${{ job.target }}'
Dockerfile: '$(Agent.BuildDirectory)/Dockerfile'
buildContext: '$(Agent.BuildDirectory)'
- task: Bash@3
displayName: '!! Docker Run Command !!'
inputs:
targetType: inline
script: echo "docker run -it --network=host --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined rocmexternalcicd.azurecr.io/nightly-${{ job.os }}-${{ job.target }}-${{ job.source }}:$(Build.BuildId)" | tr '[:upper:]' '[:lower:]'
script: echo "docker run -it --network=host --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined rocmexternalcicd.azurecr.io/nightly-${{ job.os }}-${{ job.target }}:$(Build.BuildId)" | tr '[:upper:]' '[:lower:]'

View File

@@ -0,0 +1,23 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: catch2Version
type: string
default: "v3.7.0"
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/catch2.yml
parameters:
catch2Version: ${{ parameters.catch2Version }}

View File

@@ -0,0 +1,23 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: fmtlibVersion
type: string
default: "11.1.3"
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/fmtlib.yml
parameters:
fmtlibVersion: ${{ parameters.fmtlibVersion }}

View File

@@ -0,0 +1,23 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: libdivideVersion
type: string
default: master
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/libdivide.yml
parameters:
libdivideVersion: ${{ parameters.libdivideVersion }}

View File

@@ -0,0 +1,23 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: spdlogVersion
type: string
default: "v1.15.1"
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/spdlog.yml
parameters:
spdlogVersion: ${{ parameters.spdlogVersion }}

View File

@@ -24,8 +24,12 @@ parameters:
steps:
- task: DownloadPipelineArtifact@2
displayName: Download ${{ parameters.componentName }}
retryCountOnTaskFailure: 3
inputs:
itemPattern: '**/*${{ parameters.componentName }}*${{ parameters.fileFilter }}*'
${{ if eq(parameters.componentName, 'clr') }}:
itemPattern: '**/*${{ parameters.componentName }}*${{ parameters.fileFilter }}*amd*' # filter out nvidia clr artifacts
${{ else }}:
itemPattern: '**/*${{ parameters.componentName }}*${{ parameters.fileFilter }}*'
targetPath: '$(Pipeline.Workspace)/d'
allowPartiallySucceededBuilds: true
${{ if parameters.aggregatePipeline }}:

View File

@@ -20,7 +20,7 @@ steps:
retryCountOnTaskFailure: 3
fetchFilter: blob:none
${{ if ne(parameters.sparseCheckoutDir, '') }}:
sparseCheckoutDirectories: ${{ parameters.sparseCheckoutDir }}
sparseCheckoutDirectories: ${{ parameters.sparseCheckoutDir }} shared
path: sparse
- ${{ if ne(parameters.sparseCheckoutDir, '') }}:
- task: Bash@3

View File

@@ -10,6 +10,7 @@ steps:
- ${{ if eq(parameters.registerROCmPackages, true) }}:
- task: Bash@3
displayName: 'Register AMDGPU & ROCm repos (apt)'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
@@ -20,7 +21,8 @@ steps:
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
- task: Bash@3
displayName: 'sudo apt-get update'
displayName: 'APT update and install packages'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
@@ -28,15 +30,6 @@ steps:
echo "deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/default.list
echo "deb http://archive.ubuntu.com/ubuntu/ jammy-backports main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/default.list
echo "deb http://archive.ubuntu.com/ubuntu/ jammy-security main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/default.list
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes update
- task: Bash@3
displayName: 'sudo apt-get fix'
inputs:
targetType: inline
script: sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --fix-broken install
- ${{ if gt(length(parameters.aptPackages), 0) }}:
- task: Bash@3
displayName: 'sudo apt-get install ...'
inputs:
targetType: inline
script: sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --fix-missing install ${{ join(' ', parameters.aptPackages) }}
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes update && \
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --fix-broken install && \
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --fix-missing install ${{ join(' ', parameters.aptPackages) }}

View File

@@ -5,51 +5,28 @@ parameters:
steps:
- task: Bash@3
displayName: Get aqlprofile package name
inputs:
targetType: inline
${{ if eq(parameters.os, 'ubuntu2204') }}:
script: |
export packageName=$(curl -s https://repo.radeon.com/rocm/apt/$(REPO_RADEON_VERSION)/pool/main/h/hsa-amd-aqlprofile/ | grep -oP "href=\"\K[^\"]*$(lsb_release -rs)[^\"]*\.deb")
echo "##vso[task.setvariable variable=packageName;isreadonly=true]$packageName"
${{ if eq(parameters.os, 'almalinux8') }}:
script: |
export packageName=$(curl -s https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/ | grep -oP "hsa-amd-aqlprofile-[^\"]+\.rpm" | head -n1)
echo "##vso[task.setvariable variable=packageName;isreadonly=true]$packageName"
- task: Bash@3
displayName: 'Download aqlprofile'
inputs:
targetType: inline
workingDirectory: '$(Pipeline.Workspace)'
${{ if eq(parameters.os, 'ubuntu2204') }}:
script: wget -nv https://repo.radeon.com/rocm/apt/$(REPO_RADEON_VERSION)/pool/main/h/hsa-amd-aqlprofile/$(packageName)
${{ if eq(parameters.os, 'almalinux8') }}:
script: wget -nv https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/$(packageName)
- task: Bash@3
displayName: 'Extract aqlprofile'
inputs:
targetType: inline
workingDirectory: '$(Pipeline.Workspace)'
${{ if eq(parameters.os, 'ubuntu2204') }}:
script: |
mkdir hsa-amd-aqlprofile
dpkg-deb -R $(packageName) hsa-amd-aqlprofile
${{ if eq(parameters.os, 'almalinux8') }}:
script: |
mkdir hsa-amd-aqlprofile
sudo dnf -y install rpm-build cpio
rpm2cpio $(packageName) | (cd hsa-amd-aqlprofile && cpio -idmv)
- task: Bash@3
displayName: 'Copy aqlprofile files'
displayName: Download and install aqlprofile
retryCountOnTaskFailure: 3
inputs:
targetType: inline
workingDirectory: $(Agent.BuildDirectory)
script: |
mkdir -p $(Agent.BuildDirectory)/rocm
cp -R hsa-amd-aqlprofile/opt/rocm-*/* $(Agent.BuildDirectory)/rocm
workingDirectory: '$(Pipeline.Workspace)'
- task: Bash@3
displayName: 'Clean up aqlprofile'
inputs:
targetType: inline
script: rm -rf hsa-amd-aqlprofile $(packageName)
workingDirectory: '$(Pipeline.Workspace)'
set -e
if [ "${{ parameters.os }}" = "ubuntu2204" ]; then
packageName=$(curl -s https://repo.radeon.com/rocm/apt/$(REPO_RADEON_VERSION)/pool/main/h/hsa-amd-aqlprofile/ | grep -oP "href=\"\K[^\"]*$(lsb_release -rs)[^\"]*\.deb") && \
wget -nv https://repo.radeon.com/rocm/apt/$(REPO_RADEON_VERSION)/pool/main/h/hsa-amd-aqlprofile/$packageName && \
mkdir -p hsa-amd-aqlprofile && \
dpkg-deb -R $packageName hsa-amd-aqlprofile
elif [ "${{ parameters.os }}" = "almalinux8" ]; then
sudo dnf -y install rpm-build cpio && \
packageName=$(curl -s https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/ | grep -oP "hsa-amd-aqlprofile-[^\"]+\.rpm" | head -n1) && \
wget -nv https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/$packageName && \
mkdir -p hsa-amd-aqlprofile && \
rpm2cpio $packageName | (cd hsa-amd-aqlprofile && cpio -idmv)
else
echo "Unsupported OS: ${{ parameters.os }}"
exit 1
fi && \
mkdir -p $(Agent.BuildDirectory)/rocm && \
cp -R hsa-amd-aqlprofile/opt/rocm-*/* $(Agent.BuildDirectory)/rocm && \
rm -rf hsa-amd-aqlprofile $packageName

View File

@@ -1,10 +1,15 @@
parameters:
- name: cmakeVersion
type: string
default: '3.31.0'
steps:
- task: Bash@3
displayName: Install CMake 3.31
displayName: Install CMake ${{ parameters.cmakeVersion }}
inputs:
targetType: inline
script: |
CMAKE_VERSION=3.31.0
CMAKE_VERSION=${{ parameters.cmakeVersion }}
CMAKE_ROOT="$(Pipeline.Workspace)/cmake"
echo "Downloading CMake $CMAKE_VERSION..."

View File

@@ -54,11 +54,13 @@ parameters:
libfftw3-dev: fftw-devel
libfmt-dev: fmt-devel
libgmp-dev: gmp-devel
liblapack-dev: lapack-devel
liblzma-dev: xz-devel
libmpfr-dev: mpfr-devel
libmsgpack-dev: msgpack-devel
libncurses5-dev: ncurses-devel
libnuma-dev: numactl-devel
libopenblas-dev: openblas-devel
libopenmpi-dev: openmpi-devel
libpci-dev: libpciaccess-devel
libssl-dev: openssl-devel
@@ -87,6 +89,7 @@ steps:
- ${{ if eq(parameters.registerROCmPackages, true) }}:
- task: Bash@3
displayName: 'Register AMDGPU & ROCm repos (dnf)'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
@@ -107,12 +110,13 @@ steps:
sudo dnf makecache
- task: Bash@3
displayName: 'Install base dnf packages'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
sudo dnf config-manager --set-enabled powertools
# rpm fusion free repo for some dependencies
sudo dnf -y install https://download1.rpmfusion.org/free/el/rpmfusion-free-release-8.noarch.rpm
sudo dnf config-manager --set-enabled powertools && \
sudo dnf -y install https://download1.rpmfusion.org/free/el/rpmfusion-free-release-8.noarch.rpm && \
sudo dnf -y install ${{ join(' ', parameters.basePackages) }}
- task: Bash@3
displayName: 'Check gcc environment'
@@ -126,6 +130,7 @@ steps:
g++ -print-file-name=libstdc++.so
- task: Bash@3
displayName: 'Set python 3.11 as default'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
@@ -140,18 +145,20 @@ steps:
- ${{ if eq(pkg, 'ninja-build') }}:
- task: Bash@3
displayName: 'Install ninja 1.11.1'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |
curl -LO https://github.com/ninja-build/ninja/releases/download/v1.11.1/ninja-linux.zip
sudo dnf -y install unzip
unzip ninja-linux.zip
sudo mv ninja /usr/local/bin/ninja
sudo chmod +x /usr/local/bin/ninja
sudo dnf -y install unzip && \
curl -LO https://github.com/ninja-build/ninja/releases/download/v1.11.1/ninja-linux.zip && \
unzip ninja-linux.zip && \
sudo mv ninja /usr/local/bin/ninja && \
sudo chmod +x /usr/local/bin/ninja && \
echo "##vso[task.prependpath]/usr/local/bin"
- ${{ if ne(parameters.aptToDnfMap[pkg], '') }}:
- task: Bash@3
displayName: 'dnf install ${{ parameters.aptToDnfMap[pkg] }}'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: |

View File

@@ -27,6 +27,7 @@ steps:
- ${{ if gt(length(parameters.pipModules), 0) }}:
- task: Bash@3
displayName: 'pip install ...'
retryCountOnTaskFailure: 3
inputs:
targetType: inline
script: python3 -m pip install -v --force-reinstall ${{ join(' ', parameters.pipModules) }}

View File

@@ -3,13 +3,6 @@ parameters:
- name: checkoutRef
type: string
default: ''
- name: dependencySource # optional, overrides checkoutRef
type: string
default: null
values:
- null # empty strings aren't allowed as values, use null instead
- staging
- mainline
- name: dependencyList
type: object
default: []
@@ -38,309 +31,248 @@ parameters:
type: object
default:
AMDMIGraphX:
pipelineId: $(AMDMIGRAPHX_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: master
pipelineId: 113
developBranch: develop
hasGpuTarget: true
amdsmi:
pipelineId: $(AMDSMI_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 99
developBranch: amd-staging
hasGpuTarget: false
aomp-extras:
pipelineId: $(AOMP_EXTRAS_PIPELINE_ID)
stagingBranch: aomp-dev
mainlineBranch: aomp-dev
pipelineId: 111
developBranch: aomp-dev
hasGpuTarget: false
aomp:
pipelineId: $(AOMP_PIPELINE_ID)
stagingBranch: aomp-dev
mainlineBranch: amd-mainline
pipelineId: 115
developBranch: aomp-dev
hasGpuTarget: false
aqlprofile:
pipelineId: 365
developBranch: develop
hasGpuTarget: false
clr:
pipelineId: $(CLR_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 335
developBranch: develop
hasGpuTarget: false
composable_kernel:
pipelineId: $(COMPOSABLE_KERNEL_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 86
developBranch: develop
hasGpuTarget: true
half:
pipelineId: $(HALF_PIPELINE_ID)
stagingBranch: rocm
mainlineBranch: rocm
pipelineId: 101
developBranch: rocm
hasGpuTarget: false
HIP:
pipelineId: $(HIP_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 335
developBranch: develop
hasGpuTarget: false
hip-tests:
pipelineId: $(HIP_TESTS_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 362
developBranch: develop
hasGpuTarget: false
hipBLAS:
pipelineId: $(HIPBLAS_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 317
developBranch: develop
hasGpuTarget: true
hipBLASLt:
pipelineId: $(HIPBLASLT_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 301
developBranch: develop
hasGpuTarget: true
hipBLAS-common:
pipelineId: $(HIPBLAS_COMMON_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 300
developBranch: develop
hasGpuTarget: false
hipCUB:
pipelineId: $(HIPCUB_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: develop
pipelineId: 277
developBranch: develop
hasGpuTarget: true
hipFFT:
pipelineId: $(HIPFFT_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 283
developBranch: develop
hasGpuTarget: true
hipfort:
pipelineId: $(HIPFORT_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 102
developBranch: develop
hasGpuTarget: false
HIPIFY:
pipelineId: $(HIPIFY_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 92
developBranch: amd-staging
hasGpuTarget: false
hipRAND:
pipelineId: $(HIPRAND_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: develop
pipelineId: 275
developBranch: develop
hasGpuTarget: true
hipSOLVER:
pipelineId: $(HIPSOLVER_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 84
developBranch: develop
hasGpuTarget: true
hipSPARSE:
pipelineId: $(HIPSPARSE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 315
developBranch: develop
hasGpuTarget: true
hipSPARSELt:
pipelineId: $(HIPSPARSELT_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 309
developBranch: develop
hasGpuTarget: true
hipTensor:
pipelineId: $(HIPTENSOR_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 105
developBranch: develop
hasGpuTarget: true
llvm-project:
pipelineId: $(LLVM_PROJECT_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 2
developBranch: amd-staging
hasGpuTarget: false
MIOpen:
pipelineId: $(MIOpen_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: amd-master
pipelineId: 320
developBranch: develop
hasGpuTarget: true
MIVisionX:
pipelineId: $(MIVISIONX_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: master
pipelineId: 80
developBranch: develop
hasGpuTarget: true
omnitrace: # deprecated
pipelineId: $(OMNITRACE_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
origami:
pipelineId: 364
developBranch: develop
hasGpuTarget: true
rccl:
pipelineId: $(RCCL_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 107
developBranch: develop
hasGpuTarget: true
rdc:
pipelineId: $(RDC_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 360
developBranch: develop
hasGpuTarget: false
rocAL:
pipelineId: $(ROCAL_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 151
developBranch: develop
hasGpuTarget: true
rocALUTION:
pipelineId: $(ROCALUTION_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 89
developBranch: develop
hasGpuTarget: true
rocBLAS:
pipelineId: $(ROCBLAS_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 302
developBranch: develop
hasGpuTarget: true
ROCdbgapi:
pipelineId: $(ROCDBGAPI_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 135
developBranch: amd-staging
hasGpuTarget: false
rocDecode:
pipelineId: $(ROCDECODE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 79
developBranch: develop
hasGpuTarget: false
rocFFT:
pipelineId: $(ROCFFT_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 282
developBranch: develop
hasGpuTarget: true
ROCgdb:
pipelineId: $(ROCGDB_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline-rocgdb-15
pipelineId: 134
developBranch: amd-staging
hasGpuTarget: false
rocJPEG:
pipelineId: $(ROCJPEG_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 262
developBranch: develop
hasGpuTarget: false
rocm-cmake:
pipelineId: $(ROCM_CMAKE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 6
developBranch: develop
hasGpuTarget: false
rocm-core:
pipelineId: $(ROCM_CORE_PIPELINE_ID)
stagingBranch: master
mainlineBranch: amd-master
pipelineId: 349
developBranch: develop
hasGpuTarget: false
rocm-examples:
pipelineId: $(ROCM_EXAMPLES_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 216
developBranch: amd-staging
hasGpuTarget: true
rocminfo:
pipelineId: $(ROCMINFO_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 356
developBranch: develop
hasGpuTarget: false
rocMLIR:
pipelineId: $(ROCMLIR_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 229
developBranch: develop
hasGpuTarget: false
ROCmValidationSuite:
pipelineId: $(ROCMVALIDATIONSUITE_PIPELINE_ID)
stagingBranch: master
mainlineBranch: master
pipelineId: 106
developBranch: master
hasGpuTarget: true
rocm_bandwidth_test:
pipelineId: $(ROCM_BANDWIDTH_TEST_PIPELINE_ID)
stagingBranch: master
mainlineBranch: master
pipelineId: 88
developBranch: master
hasGpuTarget: false
rocm_smi_lib:
pipelineId: $(ROCM_SMI_LIB_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 358
developBranch: develop
hasGpuTarget: false
rocPRIM:
pipelineId: $(ROCPRIM_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: develop
pipelineId: 273
developBranch: develop
hasGpuTarget: true
rocprofiler:
pipelineId: $(ROCPROFILER_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-master
pipelineId: 329
developBranch: develop
hasGpuTarget: true
rocprofiler-compute:
pipelineId: $(ROCPROFILER_COMPUTE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: amd-mainline
pipelineId: 344
developBranch: develop
hasGpuTarget: true
rocprofiler-register:
pipelineId: $(ROCPROFILER_REGISTER_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 327
developBranch: develop
hasGpuTarget: false
rocprofiler-sdk:
pipelineId: $(ROCPROFILER_SDK_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 347
developBranch: develop
hasGpuTarget: true
rocprofiler-systems:
pipelineId: $(ROCPROFILER_SYSTEMS_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 345
developBranch: develop
hasGpuTarget: true
rocPyDecode:
pipelineId: $(ROCPYDECODE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 239
developBranch: develop
hasGpuTarget: true
ROCR-Runtime:
pipelineId: $(ROCR_RUNTIME_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 354
developBranch: develop
hasGpuTarget: false
rocRAND:
pipelineId: $(ROCRAND_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: develop
pipelineId: 274
developBranch: develop
hasGpuTarget: true
rocr_debug_agent:
pipelineId: $(ROCR_DEBUG_AGENT_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 136
developBranch: amd-staging
hasGpuTarget: false
rocSOLVER:
pipelineId: $(ROCSOLVER_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 81
developBranch: develop
hasGpuTarget: true
rocSPARSE:
pipelineId: $(ROCSPARSE_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 314
developBranch: develop
hasGpuTarget: true
ROCT-Thunk-Interface: # deprecated
pipelineId: $(ROCT_THUNK_INTERFACE_PIPELINE_ID)
stagingBranch: master
mainlineBranch: master
hasGpuTarget: false
rocThrust:
pipelineId: $(ROCTHRUST_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: develop
pipelineId: 276
developBranch: develop
hasGpuTarget: true
roctracer:
pipelineId: $(ROCTRACER_PIPELINE_ID)
stagingBranch: amd-staging
mainlineBranch: amd-mainline
pipelineId: 331
developBranch: develop
hasGpuTarget: true
rocWMMA:
pipelineId: $(ROCWMMA_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 109
developBranch: develop
hasGpuTarget: true
rpp:
pipelineId: $(RPP_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 78
developBranch: develop
hasGpuTarget: true
TransferBench:
pipelineId: $(TRANSFERBENCH_PIPELINE_ID)
stagingBranch: develop
mainlineBranch: mainline
pipelineId: 265
developBranch: develop
hasGpuTarget: true
steps:
@@ -356,72 +288,30 @@ steps:
parameters:
componentName: ${{ split(dependency, ':')[0] }}
pipelineId: ${{ parameters.componentVarList[split(dependency, ':')[0]].pipelineId }}
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].developBranch }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
extractAndDeleteFiles: false
${{ if parameters.componentVarList[split(dependency, ':')[0]].hasGpuTarget }}:
fileFilter: "${{ split(dependency, ':')[1] }}*_${{ parameters.os }}_${{ parameters.gpuTarget }}"
# dependencySource = staging
${{ if eq(parameters.dependencySource, 'staging')}}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].stagingBranch }}
# dependencySource = mainline
${{ elseif eq(parameters.dependencySource, 'mainline')}}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].mainlineBranch }}
# checkoutRef = staging
${{ elseif eq(parameters.checkoutRef, parameters.componentVarList[variables['Build.DefinitionName']].stagingBranch) }}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].stagingBranch }}
# checkoutRef = mainline
${{ elseif eq(parameters.checkoutRef, parameters.componentVarList[variables['Build.DefinitionName']].mainlineBranch) }}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].mainlineBranch }}
# SourceBranchName = staging
${{ elseif eq(variables['Build.SourceBranchName'], parameters.componentVarlist[variables['Build.DefinitionName']].stagingBranch) }}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].stagingBranch }}
# SourceBranchName = mainline
${{ elseif eq(variables['Build.SourceBranchName'], parameters.componentVarlist[variables['Build.DefinitionName']].mainlineBranch) }}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].mainlineBranch }}
# default = staging
${{ else }}:
branchName: ${{ parameters.componentVarList[split(dependency, ':')[0]].stagingBranch }}
# no colon (:) found in this item in the list
- ${{ elseif containsValue(split(parameters.downstreamAggregateNames, '+'), dependency) }}:
- template: local-artifact-download.yml
parameters:
${{ if parameters.componentVarList[dependency].hasGpuTarget }}:
gpuTarget: ${{ parameters.gpuTarget }}
buildType: current
preTargetFilter: ${{ dependency }}
os: ${{ parameters.os }}
buildType: current
${{ if parameters.componentVarList[dependency].hasGpuTarget }}:
gpuTarget: ${{ parameters.gpuTarget }}
- ${{ else }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.componentVarList[dependency].pipelineId }}
branchName: ${{ parameters.componentVarList[dependency].developBranch }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
extractAndDeleteFiles: false
${{ if parameters.componentVarList[dependency].hasGpuTarget }}:
fileFilter: ${{ parameters.os }}_${{ parameters.gpuTarget }}
${{ else }}:
fileFilter: ${{ parameters.os }}
# dependencySource = staging
${{ if eq(parameters.dependencySource, 'staging')}}:
branchName: ${{ parameters.componentVarList[dependency].stagingBranch }}
# dependencySource = mainline
${{ elseif eq(parameters.dependencySource, 'mainline')}}:
branchName: ${{ parameters.componentVarList[dependency].mainlineBranch }}
# checkoutRef = staging
${{ elseif eq(parameters.checkoutRef, parameters.componentVarList[variables['Build.DefinitionName']].stagingBranch) }}:
branchName: ${{ parameters.componentVarList[dependency].stagingBranch }}
# checkoutRef = mainline
${{ elseif eq(parameters.checkoutRef, parameters.componentVarList[variables['Build.DefinitionName']].mainlineBranch) }}:
branchName: ${{ parameters.componentVarList[dependency].mainlineBranch }}
# SourceBranchName = staging
${{ elseif eq(variables['Build.SourceBranchName'], parameters.componentVarlist[variables['Build.DefinitionName']].stagingBranch) }}:
branchName: ${{ parameters.componentVarList[dependency].stagingBranch }}
# SourceBranchName = mainline
${{ elseif eq(variables['Build.SourceBranchName'], parameters.componentVarlist[variables['Build.DefinitionName']].mainlineBranch) }}:
branchName: ${{ parameters.componentVarList[dependency].mainlineBranch }}
# default = staging
${{ else }}:
branchName: ${{ parameters.componentVarList[dependency].stagingBranch }}
- task: ExtractFiles@1
displayName: Extract ROCm artifacts
inputs:

View File

@@ -8,15 +8,20 @@ parameters:
type: object
default:
boost: 250
catch2: 343
fmtlib: 341
grpc: 72
gtest: 73
half560: 68
lapack: 69
libdivide: 342
spdlog: 340
steps:
- ${{ each dependency in parameters.dependencyList }}:
- task: DownloadPipelineArtifact@2
displayName: Download ${{ dependency }}
retryCountOnTaskFailure: 3
inputs:
project: ROCm-CI
buildType: specific
@@ -28,7 +33,7 @@ steps:
inputs:
archiveFilePatterns: '$(Pipeline.Workspace)/d/**/*.tar.gz'
destinationFolder: $(Agent.BuildDirectory)/vendor
cleanDestinationFolder: true
cleanDestinationFolder: false
overwriteExistingFiles: true
- task: DeleteFiles@1
displayName: Clean up ${{ dependency }}

View File

@@ -33,6 +33,7 @@ parameters:
steps:
- task: DownloadPipelineArtifact@2
displayName: Download ${{ parameters.preTargetFilter}}*${{ parameters.os }}_${{ parameters.gpuTarget}}*${{ parameters.postTargetFilter}}
retryCountOnTaskFailure: 3
inputs:
${{ if eq(parameters.buildType, 'specific') }}:
buildType: specific

View File

@@ -7,7 +7,7 @@ steps:
- task: Bash@3
name: downloadCKBuild
displayName: Download specific CK build
continueOnError: true
retryCountOnTaskFailure: 3
env:
CXX: $(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
CC: $(Agent.BuildDirectory)/rocm/llvm/bin/amdclang
@@ -67,11 +67,32 @@ steps:
fi
echo "Downloading CK artifact from $ARTIFACT_URL"
wget --tries=5 --waitretry=10 --retry-connrefused -nv $ARTIFACT_URL -O $(System.ArtifactsDirectory)/ck.zip
unzip $(System.ArtifactsDirectory)/ck.zip -d $(System.ArtifactsDirectory)
mkdir -p $(Agent.BuildDirectory)/rocm
tar -zxvf $(System.ArtifactsDirectory)/composable_kernel*/*.tar.gz -C $(Agent.BuildDirectory)/rocm
rm -r $(System.ArtifactsDirectory)/ck.zip $(System.ArtifactsDirectory)/composable_kernel*
RETRIES=0
MAX_RETRIES=5
SUCCESS=false
while [ $RETRIES -lt $MAX_RETRIES ]; do
wget -nv $ARTIFACT_URL -O $(System.ArtifactsDirectory)/ck.zip && \
unzip $(System.ArtifactsDirectory)/ck.zip -d $(System.ArtifactsDirectory) && \
mkdir -p $(Agent.BuildDirectory)/rocm && \
tar -zxvf $(System.ArtifactsDirectory)/composable_kernel*/*.tar.gz -C $(Agent.BuildDirectory)/rocm && \
rm -r $(System.ArtifactsDirectory)/ck.zip $(System.ArtifactsDirectory)/composable_kernel*
if [ $? -eq 0 ]; then
SUCCESS=true
echo "Successfully downloaded CK."
break
else
RETRIES=$((RETRIES + 1))
echo "Failed to download CK on attempt $RETRIES/$MAX_RETRIES, retrying..."
sleep 1
fi
done
if [ "$SUCCESS" = false ]; then
echo "ERROR: failed to download CK after $MAX_RETRIES attempts."
exit 1
fi
if [[ $EXIT_CODE -ne 0 ]]; then
BUILD_COMMIT=$(curl -s $AZ_API/build/builds/$CK_BUILD_ID | jq '.sourceVersion' | tr -d '"')
@@ -82,4 +103,3 @@ steps:
fi
echo "Instead used latest CK build $CK_BUILD_ID for commit $BUILD_COMMIT"
fi
exit $EXIT_CODE

View File

@@ -23,145 +23,25 @@ variables:
value: rocm-ci_high_build_pool
- name: ULTRA_BUILD_POOL
value: rocm-ci_ultra_build_pool
- name: ON_PREM_BUILD_POOL
value: rocm-ci_build_pool
- name: LARGE_DISK_BUILD_POOL
value: rocm-ci_larger_base_disk_pool
- name: GFX942_TEST_POOL
value: gfx942_test_pool
- name: GFX90A_TEST_POOL
value: gfx90a_test_pool
- name: LATEST_RELEASE_VERSION
value: 6.4.2
value: 6.4.3
- name: REPO_RADEON_VERSION
value: 6.4.2
value: 6.4.3
- name: NEXT_RELEASE_VERSION
value: 7.0.0
- name: LATEST_RELEASE_TAG
value: rocm-6.4.2
value: rocm-6.4.3
- name: DOCKER_SKIP_GFX
value: gfx90a
- name: AMDMIGRAPHX_PIPELINE_ID
value: 113
- name: AMDSMI_PIPELINE_ID
value: 99
- name: AOMP_EXTRAS_PIPELINE_ID
value: 111
- name: AOMP_PIPELINE_ID
value: 115
- name: CLR_PIPELINE_ID
value: 145
- name: COMPOSABLE_KERNEL_PIPELINE_ID
value: 86
- name: FLANG_LEGACY_PIPELINE_ID
value: 77
- name: HALF_PIPELINE_ID
value: 101
- name: HALF560_PIPELINE_ID
value: 68
- name: HALF560_BUILD_ID
value: 621
- name: HIP_PIPELINE_ID
value: 93
- name: HIP_TESTS_PIPELINE_ID
value: 233
- name: HIPBLAS_COMMON_PIPELINE_ID
value: 300
- name: HIPBLAS_PIPELINE_ID
value: 317
- name: HIPBLASLT_PIPELINE_ID
value: 301
- name: HIPCUB_PIPELINE_ID
value: 277
- name: HIPFFT_PIPELINE_ID
value: 121
- name: HIPFORT_PIPELINE_ID
value: 102
- name: HIPIFY_PIPELINE_ID
value: 92
- name: HIPRAND_PIPELINE_ID
value: 275
- name: HIPSOLVER_PIPELINE_ID
value: 84
- name: HIPSPARSE_PIPELINE_ID
value: 315
- name: HIPSPARSELT_PIPELINE_ID
value: 309
- name: HIPTENSOR_PIPELINE_ID
value: 105
- name: LLVM_PROJECT_PIPELINE_ID
value: 2
- name: MIOPEN_PIPELINE_ID
value: 108
- name: MIVISIONX_PIPELINE_ID
value: 80
- name: RCCL_PIPELINE_ID
value: 107
- name: RDC_PIPELINE_ID
value: 100
- name: ROCAL_PIPELINE_ID
value: 151
- name: ROCALUTION_PIPELINE_ID
value: 89
- name: ROCBLAS_PIPELINE_ID
value: 302
- name: ROCDBGAPI_PIPELINE_ID
value: 135
- name: ROCDECODE_PIPELINE_ID
value: 79
- name: ROCFFT_PIPELINE_ID
value: 120
- name: ROCGDB_PIPELINE_ID
value: 134
- name: ROCJPEG_PIPELINE_ID
value: 262
- name: ROCM_BANDWIDTH_TEST_PIPELINE_ID
value: 88
- name: ROCM_CMAKE_PIPELINE_ID
value: 6
- name: ROCM_CORE_PIPELINE_ID
value: 103
- name: ROCM_EXAMPLES_PIPELINE_ID
value: 216
- name: ROCM_SMI_LIB_PIPELINE_ID
value: 96
- name: ROCMINFO_PIPELINE_ID
value: 91
- name: ROCMLIR_PIPELINE_ID
value: 229
- name: ROCMVALIDATIONSUITE_PIPELINE_ID
value: 106
- name: ROCPRIM_PIPELINE_ID
value: 273
- name: ROCPROFILER_COMPUTE_PIPELINE_ID
value: 257
- name: ROCPROFILER_REGISTER_PIPELINE_ID
value: 1
- name: ROCPROFILER_SDK_PIPELINE_ID
value: 246
- name: ROCPROFILER_SYSTEMS_PIPELINE_ID
value: 255
- name: ROCPROFILER_PIPELINE_ID
value: 143
- name: ROCPYDECODE_PIPELINE_ID
value: 239
- name: ROCR_DEBUG_AGENT_PIPELINE_ID
value: 136
- name: ROCR_RUNTIME_PIPELINE_ID
value: 10
- name: ROCRAND_PIPELINE_ID
value: 274
- name: ROCSOLVER_PIPELINE_ID
value: 81
- name: ROCSPARSE_PIPELINE_ID
value: 314
- name: ROCTHRUST_PIPELINE_ID
value: 276
- name: ROCTRACER_PIPELINE_ID
value: 141
- name: ROCWMMA_PIPELINE_ID
value: 109
- name: RPP_PIPELINE_ID
value: 78
- name: TRANSFERBENCH_PIPELINE_ID
value: 265

View File

@@ -5,6 +5,7 @@ ACEs
ACS
AccVGPR
AccVGPRs
AITER
ALU
AllReduce
AMD
@@ -35,6 +36,7 @@ Arb
Autocast
BARs
BatchNorm
BKC
BLAS
BMC
BabelStream
@@ -42,9 +44,11 @@ Blit
Blockwise
Bluefield
Bootloader
Broadcom
CAS
CCD
CDNA
CGUI
CHTML
CIFAR
CLI
@@ -60,6 +64,7 @@ CPU
CPUs
Cron
CSC
CSDATA
CSE
CSV
CSn
@@ -69,16 +74,19 @@ CU
CUDA
CUs
CXX
CX
Cavium
CentOS
ChatGPT
CoRR
Codespaces
ComfyUI
Commitizen
CommonMark
Concretized
Conda
ConnectX
CountOnes
CuPy
da
Dashboarding
@@ -95,6 +103,7 @@ DIMM
DKMS
DL
DMA
DOMContentLoaded
DNN
DNNL
DPM
@@ -113,14 +122,21 @@ Dependabot
Deprecations
DevCap
DirectX
Disaggregated
disaggregated
Dockerfile
Dockerized
Doxygen
dropless
ELMo
ENDPGM
EPYC
ESXi
EoS
etcd
fas
FBGEMM
FIFOs
FFT
FFTs
FFmpeg
@@ -133,6 +149,8 @@ Filesystem
FindDb
Flang
FlashAttention
FlashInfers
FlashInfer
FluxBenchmark
Fortran
Fuyu
@@ -151,6 +169,7 @@ GEMMs
GFLOPS
GFortran
GFXIP
GGUF
Gemma
GiB
GIM
@@ -160,6 +179,7 @@ GLXT
Gloo
GMI
GPG
GPGPU
GPR
GPT
GPU
@@ -168,6 +188,7 @@ GPUs
Graphbolt
GraphSage
GRBM
GRE
GenAI
GenZ
GitHub
@@ -192,8 +213,11 @@ HWE
HWS
Haswell
Higgs
href
Hyperparameters
Huggingface
HunyuanVideo
IB
ICD
ICT
ICV
@@ -202,8 +226,11 @@ IDEs
IFWI
IMDb
IncDec
instrSize
interpolators
IOMMU
IOP
IOPS
IOPM
IOV
IRQ
@@ -240,12 +267,15 @@ LLM
LLMs
LLVM
LM
LRU
LSAN
LSan
LTS
LSTMs
LteAll
LanguageCrossEntropy
LoRA
MECO
MEM
MERCHANTABILITY
MFMA
@@ -264,13 +294,16 @@ MNIST
MPI
MPT
MSVC
mul
MVAPICH
MVFFR
Makefile
Makefiles
ManyLinux
Matplotlib
Matrox
MaxText
Megablocks
Megatrends
Megatron
Mellanox
@@ -280,11 +313,15 @@ Miniconda
MirroredStrategy
Mixtral
MosaicML
MoEs
Mooncake
Mpops
Multicore
Multithreaded
MXFP
MyEnvironment
MyST
NANOO
NBIO
NBIOs
NCCL
@@ -339,9 +376,11 @@ PCC
PCI
PCIe
PEFT
perf
PEQT
PIL
PILImage
PLDM
POR
PRNG
PRs
@@ -356,6 +395,7 @@ PowerEdge
PowerShell
Pretrained
Pretraining
Primus
Profiler's
PyPi
Pytest
@@ -405,6 +445,7 @@ SBIOS
SCA
ScaledGEMM
SDK
SDKs
SDMA
SDPA
SDRAM
@@ -421,7 +462,9 @@ SKU
SKUs
SLES
SLURM
Slurm
SMEM
SMFMA
SMI
SMT
SPI
@@ -433,18 +476,25 @@ SWE
SerDes
ShareGPT
Shlens
simd
Skylake
Softmax
Spack
SplitK
Supermicro
Szegedy
TagRAM
TCA
TCC
TCCs
TCI
TCIU
TCP
TCR
TVM
TheRock
THREADGROUPS
threadgroups
TensorRT
TensorFloat
TF
@@ -454,6 +504,8 @@ TPS
TPU
TPUs
TSME
Taichi
Taichi's
Tagram
TensileLite
TensorBoard
@@ -486,9 +538,11 @@ UltraChat
Uncached
Unittests
Unhandled
unwindowed
VALU
VBIOS
VCN
verl's
VGPR
VGPRs
VM
@@ -501,11 +555,13 @@ Vanhoucke
Vulkan
WGP
WGPs
WR
WX
WikiText
Wojna
Workgroups
Writebacks
xcc
XCD
XCDs
XGBoost
@@ -518,6 +574,7 @@ Xilinx
Xnack
Xteam
YAML
YAMLs
YML
YModel
ZeRO
@@ -525,6 +582,7 @@ ZenDNN
accuracies
activations
addr
addEventListener
ade
ai
alloc
@@ -540,6 +598,7 @@ autogenerated
autotune
avx
awk
az
backend
backends
bb
@@ -557,6 +616,7 @@ boson
bosons
br
BrainFloat
btn
buildable
bursty
bzip
@@ -568,17 +628,21 @@ centric
changelog
checkpointing
chiplet
classList
cmake
cmd
coalescable
codename
collater
comgr
compat
completers
composable
concretization
config
configs
conformant
const
constructible
convolutional
convolves
@@ -619,6 +683,7 @@ detections
dev
devicelibs
devsel
dgl
dimensionality
disambiguates
distro
@@ -642,6 +707,7 @@ exascale
executables
ffmpeg
filesystem
forEach
fortran
fp
framebuffer
@@ -650,13 +716,16 @@ galb
gcc
gdb
gemm
getAttribute
gfortran
gfx
githooks
github
globals
gnupg
gpu
grayscale
gx
gzip
heterogenous
hipBLAS
@@ -709,6 +778,7 @@ invariants
invocating
ipo
jax
json
kdb
kfd
kv
@@ -729,6 +799,8 @@ logits
lossy
macOS
matchers
maxtext
megablocks
megatron
microarchitecture
migraphx
@@ -757,6 +829,7 @@ opencv
openmp
openssl
optimizers
ol
os
oversubscription
pageable
@@ -766,6 +839,7 @@ parallelizing
param
parameterization
passthrough
pe
perfcounter
performant
perl
@@ -788,11 +862,14 @@ preprocessing
preprocessor
prequantized
prerequisites
pretrain
pretraining
primus
profiler
profilers
protobuf
pseudorandom
px
py
pytorch
recommender
@@ -800,12 +877,15 @@ recommenders
quantile
quantizer
quasirandom
querySelector
querySelectorAll
queueing
qwen
radeon
rccl
rdc
rdma
redhat
reStructuredText
redirections
refactorization
@@ -818,6 +898,8 @@ req
resampling
rescaling
reusability
rhel
rl
RLHF
roadmap
roc
@@ -856,25 +938,30 @@ roctracer
rst
runtime
runtimes
ryzen
ResNet
sL
scalability
scalable
scipy
seealso
selectedTag
sendmsg
seqs
serializers
setAttribute
sglang
shader
sharding
sigmoid
sles
sm
smi
softmax
spack
spmm
src
stanford
stochastically
strided
subcommand
@@ -891,8 +978,10 @@ symlink
symlinks
sys
tabindex
targetContainer
td
tensorfloat
tf
th
tokenization
tokenize
@@ -903,12 +992,15 @@ toolchain
toolchains
toolset
toolsets
torchtitan
torchvision
tp
tqdm
tracebacks
txt
TopK
uarch
ubuntu
uncached
uncacheable
uncorrectable
@@ -955,6 +1047,8 @@ writebacks
wrreq
wzo
xargs
xDiT
xdit
xGMI
xPacked
xz

File diff suppressed because it is too large Load Diff

View File

@@ -23,9 +23,6 @@ source software compilers, debuggers, and libraries. ROCm is fully integrated in
> A new open source build platform for ROCm is under development at
> https://github.com/ROCm/TheRock, featuring a unified CMake build with bundled
> dependencies, Windows support, and more.
>
> The instructions below describe the prior process for building from source
> which will be replaced once TheRock is mature enough.
## Getting and Building ROCm from Source

1164
RELEASE.md

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-6.4.2"
<default revision="refs/tags/rocm-7.0.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />
@@ -9,6 +9,7 @@
<project name="ROCK-Kernel-Driver" />
<project name="ROCR-Runtime" />
<project name="amdsmi" />
<project name="aqlprofile" />
<project name="rdc" />
<project name="rocm_bandwidth_test" />
<project name="rocm_smi_lib" />
@@ -22,7 +23,7 @@
<project name="rocprofiler-systems" />
<project name="roctracer" />
<!--HIP Projects-->
<project name="HIP" />
<project name="hip" />
<project name="hip-tests" />
<project name="HIPIFY" />
<project name="clr" />
@@ -37,36 +38,24 @@
<project name="rocr_debug_agent" />
<!-- ROCm Libraries -->
<project groups="mathlibs" name="AMDMIGraphX" />
<project groups="mathlibs" name="MIOpen" />
<project groups="mathlibs" name="MIVisionX" />
<project groups="mathlibs" name="ROCmValidationSuite" />
<project groups="mathlibs" name="Tensile" />
<project groups="mathlibs" name="composable_kernel" />
<project groups="mathlibs" name="hipBLAS-common" />
<project groups="mathlibs" name="hipBLAS" />
<project groups="mathlibs" name="hipBLASLt" />
<project groups="mathlibs" name="hipCUB" />
<project groups="mathlibs" name="hipFFT" />
<project groups="mathlibs" name="hipRAND" />
<project groups="mathlibs" name="hipSOLVER" />
<project groups="mathlibs" name="hipSPARSE" />
<project groups="mathlibs" name="hipSPARSELt" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipfort" />
<project groups="mathlibs" name="rccl" />
<project groups="mathlibs" name="rocAL" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="rocDecode" />
<project groups="mathlibs" name="rocJPEG" />
<!-- The following components have been migrated to rocm-libraries:
hipBLAS-common hipBLAS hipBLASLt hipCUB
hipFFT hipRAND hipSPARSE hipSPARSELt
MIOpen rocBLAS rocFFT rocPRIM rocRAND
rocSPARSE rocThrust Tensile -->
<project groups="mathlibs" name="rocm-libraries" />
<project groups="mathlibs" name="rocPyDecode" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocRAND" />
<project groups="mathlibs" name="rocSHMEM" />
<project groups="mathlibs" name="rocSOLVER" />
<project groups="mathlibs" name="rocSPARSE" />
<project groups="mathlibs" name="rocThrust" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="rocm-cmake" />
<project groups="mathlibs" name="rpp" />

View File

@@ -29,6 +29,7 @@ additional licenses. Please review individual repositories for more information.
| [AMD SMI](https://github.com/ROCm/amdsmi) | [MIT](https://github.com/ROCm/amdsmi/blob/amd-staging/LICENSE) |
| [aomp](https://github.com/ROCm/aomp/) | [Apache 2.0](https://github.com/ROCm/aomp/blob/aomp-dev/LICENSE) |
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
| [AQLprofile](https://github.com/rocm/aqlprofile/) | [MIT](https://github.com/ROCm/aqlprofile/blob/amd-staging/LICENSE.md) |
| [Code Object Manager (Comgr)](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/LICENSE.txt) |
| [Composable Kernel](https://github.com/ROCm/composable_kernel) | [MIT](https://github.com/ROCm/composable_kernel/blob/develop/LICENSE) |
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/rocm/LICENSE.txt) |
@@ -46,11 +47,10 @@ additional licenses. Please review individual repositories for more information.
| [hipSPARSE](https://github.com/ROCm/hipSPARSE/) | [MIT](https://github.com/ROCm/hipSPARSE/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCm/hipSPARSELt/) | [MIT](https://github.com/ROCm/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCm/hipTensor) | [MIT](https://github.com/ROCm/hipTensor/blob/develop/LICENSE) |
| hsa-amd-aqlprofile | [AMD Software EULA](https://www.amd.com/en/legal/eula/amd-software-eula.html) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/amd-staging/LICENSE.TXT) |
| [llvm-project/flang](https://github.com/ROCm/llvm-project/tree/amd-staging/flang) | [Apache 2.0](https://github.com/ROCm/llvm-project/blob/amd-staging/flang/LICENSE.TXT) |
| [MIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/develop/LICENSE.txt) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/rocm-libraries/blob/develop/projects/miopen/LICENSE.md) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/develop/LICENSE.txt) |
| [rocAL](https://github.com/ROCm/rocAL) | [MIT](https://github.com/ROCm/rocAL/blob/develop/LICENSE.txt) |
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
@@ -67,15 +67,15 @@ additional licenses. Please review individual repositories for more information.
| [ROCm Communication Collectives Library (RCCL)](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [ROCm-Core](https://github.com/ROCm/rocm-core) | [MIT](https://github.com/ROCm/rocm-core/blob/master/copyright) |
| [ROCm Compute Profiler](https://github.com/ROCm/rocprofiler-compute) | [MIT](https://github.com/ROCm/rocprofiler-compute/blob/amd-staging/LICENSE) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/amd-staging/LICENSE) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/amd-staging/LICENSE.md) |
| [ROCm-Device-Libs](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/clr/tree/amd-staging/opencl) | [MIT](https://github.com/ROCm/clr/blob/amd-staging/opencl/LICENSE.txt) |
| [ROCm Performance Primitives (RPP)](https://github.com/ROCm/rpp) | [MIT](https://github.com/ROCm/rpp/blob/develop/LICENSE) |
| [ROCm SMI Lib](https://github.com/ROCm/rocm_smi_lib/) | [MIT](https://github.com/ROCm/rocm_smi_lib/blob/amd-staging/License.txt) |
| [ROCm Systems Profiler](https://github.com/ROCm/rocprofiler-systems) | [MIT](https://github.com/ROCm/rocprofiler-systems/blob/amd-staging/LICENSE) |
| [ROCm SMI Lib](https://github.com/ROCm/rocm_smi_lib/) | [MIT](https://github.com/ROCm/rocm_smi_lib/blob/amd-staging/LICENSE.md) |
| [ROCm Systems Profiler](https://github.com/ROCm/rocprofiler-systems) | [MIT](https://github.com/ROCm/rocprofiler-systems/blob/amd-staging/LICENSE.md) |
| [ROCm Validation Suite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| [rocPRIM](https://github.com/ROCm/rocPRIM/) | [MIT](https://github.com/ROCm/rocPRIM/blob/develop/LICENSE.txt) |
| [ROCProfiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-staging/LICENSE) |
| [ROCProfiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-staging/LICENSE.md) |
| [ROCprofiler-SDK](https://github.com/ROCm/rocprofiler-sdk) | [MIT](https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/LICENSE) |
| [rocPyDecode](https://github.com/ROCm/rocPyDecode) | [MIT](https://github.com/ROCm/rocPyDecode/blob/develop/LICENSE.txt) |
| [rocRAND](https://github.com/ROCm/rocRAND/) | [MIT](https://github.com/ROCm/rocRAND/blob/develop/LICENSE.txt) |
@@ -132,12 +132,10 @@ companies.
### Package licensing
:::{attention}
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
subject to the license agreement enclosed in the directory for the binary available
in `/opt/rocm/share/doc/hsa-amd-aqlprofile/EULA`. By using, installing,
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
ROCprof Trace Decoder and AOCC CPU optimizations are provided in binary form, subject to the license agreement enclosed on [GitHub](https://github.com/ROCm/rocprof-trace-decoder/blob/amd-mainline/LICENSE) for ROCprof Trace Decoder, and [Developer Central](https://www.amd.com/en/developer/aocc.html) for AOCC. By using, installing,
copying or distributing ROCprof Trace Decoder or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
terms of this agreement, do not install, copy or use ROCprof Trace Decoder or the
AOCC CPU Optimizations.
:::

View File

@@ -1,129 +0,0 @@
ROCm Version,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
:ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
,"Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_",Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,,,
,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,,,,,,,,,,,
,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,,,,,,,,,,,,
,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
,RDNA4,RDNA4,,,,,,,,,,,,,,,
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS-past-60]_,gfx1201 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1200 [#RDNA-OS-past-60]_,gfx1200 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_,gfx1101 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,,,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,N/A,N/A,N/A,85f95ae,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.3.0.post0,N/A,N/A,N/A,N/A,N/A,N/A
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.2,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
,,,,,,,,,,,,,,,,,
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
Thrust,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
CUB,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
,,,,,,,,,,,,,,,,,
KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
,,,,,,,,,,,,,,,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.12.0,2.12.0,2.12.0,2.11.0,2.11.0,2.11.0,2.11.0,2.10.0,2.10.0,2.10.0,2.10.0,2.9.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
:doc:`rocAL <rocal:index>`,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0,2.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`rocDecode <rocdecode:index>`,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
:doc:`rocJPEG <rocjpeg:index>`,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`rocPyDecode <rocpydecode:index>`,0.3.1,0.3.1,0.3.1,0.2.0,0.2.0,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`RPP <rpp:index>`,1.9.10,1.9.10,1.9.10,1.9.1,1.9.1,1.9.1,1.9.1,1.8.0,1.8.0,1.8.0,1.8.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
,,,,,,,,,,,,,,,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
:doc:`rocSHMEM <rocshmem:index>`,2.0.1,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
,,,,,,,,,,,,,,,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.1,0.12.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.7.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.18,1.0.18,1.0.18,1.0.17,1.0.17,1.0.17,1.0.17,1.0.16,1.0.15,1.0.15,1.0.14,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
:doc:`hipfort <hipfort:index>`,0.6.0,0.6.0,0.6.0,0.5.1,0.5.1,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.12.0,2.12.0,2.12.0,2.11.1,2.11.1,2.11.1,2.11.0,2.11.1,2.11.0,2.11.0,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.1,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.1.1,3.1.1,3.1.1,3.1.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.3,0.2.3,0.2.3,0.2.2,0.2.2,0.2.2,0.2.2,0.2.1,0.2.1,0.2.1,0.2.1,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.3,3.2.2,3.2.1,3.2.1,3.2.1,3.2.1,3.2.1,3.2.0,3.2.0,3.2.0,3.1.1,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.4.1,4.4.0,4.4.0,4.3.0,4.3.0,4.3.0,4.3.0,4.2.4,4.2.1,4.2.1,4.2.0,4.1.2,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.32,1.0.32,1.0.32,1.0.31,1.0.31,1.0.31,1.0.31,1.0.30,1.0.29,1.0.29,1.0.28,1.0.27,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
:doc:`rocRAND <rocrand:index>`,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.1,3.1.0,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.28.2,3.28.0,3.28.0,3.27.0,3.27.0,3.27.0,3.27.0,3.26.2,3.26.0,3.26.0,3.26.0,3.25.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.7.0,1.7.0,1.7.0,1.6.0,1.6.0,1.6.0,1.6.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
:doc:`Tensile <tensile:src/index>`,4.43.0,4.43.0,4.43.0,4.42.0,4.42.0,4.42.0,4.42.0,4.41.0,4.41.0,4.41.0,4.41.0,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
,,,,,,,,,,,,,,,,,
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`hipCUB <hipcub:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.4.1,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.2,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
,,,,,,,,,,,,,,,,,
SUPPORT LIBS,,,,,,,,,,,,,,,,,
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
,,,,,,,,,,,,,,,,,
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`AMD SMI <amdsmi:index>`,25.5.1,25.4.2,25.3.0,24.7.1,24.7.1,24.7.1,24.7.1,24.6.3,24.6.3,24.6.3,24.6.2,24.5.1,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.5.0,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
,,,,,,,,,,,,,,,,,
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.1,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
,,,,,,,,,,,,,,,,,
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,,
:doc:`HIPIFY <hipify:index>`,19.0.0,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.13.0,0.13.0,0.13.0,0.13.0,0.12.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.2,0.77.2,0.77.2,0.77.0,0.77.0,0.77.0,0.77.0,0.76.0,0.76.0,0.76.0,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,14.2.0,14.2.0,14.2.0,14.2.0,14.1.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.3.0,0.3.0,0.3.0,0.3.0,N/A,N/A
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.4,2.0.4,2.0.4,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
,,,,,,,,,,,,,,,,,
COMPILERS,.. _compilers-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`llvm-project <llvm-project:index>`,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
,,,,,,,,,,,,,,,,,
RUNTIMES,.. _runtime-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
:doc:`HIP <hip:index>`,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
:doc:`ROCr Runtime <rocr-runtime:index>`,1.15.0,1.15.0,1.15.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.13.0,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
1 ROCm Version 6.4.2 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
2 :ref:`Operating systems & kernels <OS-kernel-versions>` Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.1, 24.04 Ubuntu 24.04.1, 24.04 Ubuntu 24.04.1, 24.04 Ubuntu 24.04
3 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3, 22.04.2 Ubuntu 22.04.4, 22.04.3, 22.04.2
4 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5
5 RHEL 9.6, 9.4 RHEL 9.6, 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.3, 9.2 RHEL 9.3, 9.2
6 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8
7 SLES 15 SP7, SP6 SLES 15 SP6 SLES 15 SP6 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4
8 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9
9 Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_
10 Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_
11 Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_
12 .. _architecture-support-compatibility-matrix-past-60:
13 :doc:`Architecture <rocm-install-on-linux:reference/system-requirements>` CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3
14 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2
15 CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA
16 RDNA4 RDNA4
17 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3
18 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2
19 .. _gpu-support-compatibility-matrix-past-60:
20 :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>` gfx1201 [#RDNA-OS-past-60]_ gfx1201 [#RDNA-OS-past-60]_
21 gfx1200 [#RDNA-OS-past-60]_ gfx1200 [#RDNA-OS-past-60]_
22 gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_ gfx1101 [#RDNA-OS-past-60]_
23 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100
24 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030
25 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 [#mi300_624-past-60]_ gfx942 [#mi300_622-past-60]_ gfx942 [#mi300_621-past-60]_ gfx942 [#mi300_620-past-60]_ gfx942 [#mi300_612-past-60]_ gfx942 [#mi300_612-past-60]_ gfx942 [#mi300_611-past-60]_ gfx942 [#mi300_610-past-60]_ gfx942 [#mi300_602-past-60]_ gfx942 [#mi300_600-past-60]_
26 gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a
27 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908
28
29 FRAMEWORK SUPPORT .. _framework-support-compatibility-matrix-past-60:
30 :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>` 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13
31 :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>` 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.14.0, 2.13.1, 2.12.1 2.14.0, 2.13.1, 2.12.1
32 :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>` 0.4.35 0.4.35 0.4.35 0.4.31 0.4.31 0.4.31 0.4.31 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26
33 :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` N/A N/A N/A N/A N/A 85f95ae N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
34 :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` 2.4.0 2.4.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
35 :doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat]_ N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.3.0.post0 N/A N/A N/A N/A N/A N/A
36 `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_ 1.2 1.2 1.2 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.14.1 1.14.1
37
38
39 THIRD PARTY COMMS .. _thirdpartycomms-support-compatibility-matrix-past-60:
40 `UCC <https://github.com/ROCm/ucc>`_ >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.2.0 >=1.2.0
41 `UCX <https://github.com/ROCm/ucx>`_ >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1
42
43 THIRD PARTY ALGORITHM .. _thirdpartyalgorithm-support-compatibility-matrix-past-60:
44 Thrust 2.5.0 2.5.0 2.5.0 2.3.2 2.3.2 2.3.2 2.3.2 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
45 CUB 2.5.0 2.5.0 2.5.0 2.3.2 2.3.2 2.3.2 2.3.2 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
46
47 KMD & USER SPACE [#kfd_support-past-60]_ .. _kfd-userspace-support-compatibility-matrix-past-60:
48 :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>` 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x
49
50 ML & COMPUTER VISION .. _mllibs-support-compatibility-matrix-past-60:
51 :doc:`Composable Kernel <composable_kernel:index>` 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0
52 :doc:`MIGraphX <amdmigraphx:index>` 2.12.0 2.12.0 2.12.0 2.11.0 2.11.0 2.11.0 2.11.0 2.10.0 2.10.0 2.10.0 2.10.0 2.9.0 2.9.0 2.9.0 2.9.0 2.8.0 2.8.0
53 :doc:`MIOpen <miopen:index>` 3.4.0 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.0 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
54 :doc:`MIVisionX <mivisionx:index>` 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0 3.0.0 3.0.0 2.5.0 2.5.0 2.5.0 2.5.0 2.5.0 2.5.0
55 :doc:`rocAL <rocal:index>` 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.0 2.0.0 2.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
56 :doc:`rocDecode <rocdecode:index>` 0.10.0 0.10.0 0.10.0 0.8.0 0.8.0 0.8.0 0.8.0 0.6.0 0.6.0 0.6.0 0.6.0 0.6.0 0.6.0 0.5.0 0.5.0 N/A N/A
57 :doc:`rocJPEG <rocjpeg:index>` 0.8.0 0.8.0 0.8.0 0.6.0 0.6.0 0.6.0 0.6.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
58 :doc:`rocPyDecode <rocpydecode:index>` 0.3.1 0.3.1 0.3.1 0.2.0 0.2.0 0.2.0 0.2.0 0.1.0 0.1.0 0.1.0 0.1.0 N/A N/A N/A N/A N/A N/A
59 :doc:`RPP <rpp:index>` 1.9.10 1.9.10 1.9.10 1.9.1 1.9.1 1.9.1 1.9.1 1.8.0 1.8.0 1.8.0 1.8.0 1.5.0 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0
60
61 COMMUNICATION .. _commlibs-support-compatibility-matrix-past-60:
62 :doc:`RCCL <rccl:index>` 2.22.3 2.22.3 2.22.3 2.21.5 2.21.5 2.21.5 2.21.5 2.20.5 2.20.5 2.20.5 2.20.5 2.18.6 2.18.6 2.18.6 2.18.6 2.18.3 2.18.3
63 :doc:`rocSHMEM <rocshmem:index>` 2.0.1 2.0.0 2.0.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
64
65 MATH LIBS .. _mathlibs-support-compatibility-matrix-past-60:
66 `half <https://github.com/ROCm/half>`_ 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0
67 :doc:`hipBLAS <hipblas:index>` 2.4.0 2.4.0 2.4.0 2.3.0 2.3.0 2.3.0 2.3.0 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.0 2.0.0
68 :doc:`hipBLASLt <hipblaslt:index>` 0.12.1 0.12.1 0.12.0 0.10.0 0.10.0 0.10.0 0.10.0 0.8.0 0.8.0 0.8.0 0.8.0 0.7.0 0.7.0 0.7.0 0.7.0 0.6.0 0.6.0
69 :doc:`hipFFT <hipfft:index>` 1.0.18 1.0.18 1.0.18 1.0.17 1.0.17 1.0.17 1.0.17 1.0.16 1.0.15 1.0.15 1.0.14 1.0.14 1.0.14 1.0.14 1.0.14 1.0.13 1.0.13
70 :doc:`hipfort <hipfort:index>` 0.6.0 0.6.0 0.6.0 0.5.1 0.5.1 0.5.0 0.5.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0
71 :doc:`hipRAND <hiprand:index>` 2.12.0 2.12.0 2.12.0 2.11.1 2.11.1 2.11.1 2.11.0 2.11.1 2.11.0 2.11.0 2.11.0 2.10.16 2.10.16 2.10.16 2.10.16 2.10.16 2.10.16
72 :doc:`hipSOLVER <hipsolver:index>` 2.4.0 2.4.0 2.4.0 2.3.0 2.3.0 2.3.0 2.3.0 2.2.0 2.2.0 2.2.0 2.2.0 2.1.1 2.1.1 2.1.1 2.1.0 2.0.0 2.0.0
73 :doc:`hipSPARSE <hipsparse:index>` 3.2.0 3.2.0 3.2.0 3.1.2 3.1.2 3.1.2 3.1.2 3.1.1 3.1.1 3.1.1 3.1.1 3.0.1 3.0.1 3.0.1 3.0.1 3.0.0 3.0.0
74 :doc:`hipSPARSELt <hipsparselt:index>` 0.2.3 0.2.3 0.2.3 0.2.2 0.2.2 0.2.2 0.2.2 0.2.1 0.2.1 0.2.1 0.2.1 0.2.0 0.2.0 0.1.0 0.1.0 0.1.0 0.1.0
75 :doc:`rocALUTION <rocalution:index>` 3.2.3 3.2.3 3.2.2 3.2.1 3.2.1 3.2.1 3.2.1 3.2.1 3.2.0 3.2.0 3.2.0 3.1.1 3.1.1 3.1.1 3.1.1 3.0.3 3.0.3
76 :doc:`rocBLAS <rocblas:index>` 4.4.1 4.4.0 4.4.0 4.3.0 4.3.0 4.3.0 4.3.0 4.2.4 4.2.1 4.2.1 4.2.0 4.1.2 4.1.2 4.1.0 4.1.0 4.0.0 4.0.0
77 :doc:`rocFFT <rocfft:index>` 1.0.32 1.0.32 1.0.32 1.0.31 1.0.31 1.0.31 1.0.31 1.0.30 1.0.29 1.0.29 1.0.28 1.0.27 1.0.27 1.0.27 1.0.26 1.0.25 1.0.23
78 :doc:`rocRAND <rocrand:index>` 3.3.0 3.3.0 3.3.0 3.2.0 3.2.0 3.2.0 3.2.0 3.1.1 3.1.0 3.1.0 3.1.0 3.0.1 3.0.1 3.0.1 3.0.1 3.0.0 2.10.17
79 :doc:`rocSOLVER <rocsolver:index>` 3.28.2 3.28.0 3.28.0 3.27.0 3.27.0 3.27.0 3.27.0 3.26.2 3.26.0 3.26.0 3.26.0 3.25.0 3.25.0 3.25.0 3.25.0 3.24.0 3.24.0
80 :doc:`rocSPARSE <rocsparse:index>` 3.4.0 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.1 3.2.0 3.2.0 3.2.0 3.1.2 3.1.2 3.1.2 3.1.2 3.0.2 3.0.2
81 :doc:`rocWMMA <rocwmma:index>` 1.7.0 1.7.0 1.7.0 1.6.0 1.6.0 1.6.0 1.6.0 1.5.0 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0
82 :doc:`Tensile <tensile:src/index>` 4.43.0 4.43.0 4.43.0 4.42.0 4.42.0 4.42.0 4.42.0 4.41.0 4.41.0 4.41.0 4.41.0 4.40.0 4.40.0 4.40.0 4.40.0 4.39.0 4.39.0
83
84 PRIMITIVES .. _primitivelibs-support-compatibility-matrix-past-60:
85 :doc:`hipCUB <hipcub:index>` 3.4.0 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.1 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
86 :doc:`hipTensor <hiptensor:index>` 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0 1.3.0 1.3.0 1.2.0 1.2.0 1.2.0 1.2.0 1.1.0 1.1.0
87 :doc:`rocPRIM <rocprim:index>` 3.4.1 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.2 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
88 :doc:`rocThrust <rocthrust:index>` 3.3.0 3.3.0 3.3.0 3.3.0 3.3.0 3.3.0 3.3.0 3.1.1 3.1.0 3.1.0 3.0.1 3.0.1 3.0.1 3.0.1 3.0.1 3.0.0 3.0.0
89
90 SUPPORT LIBS
91 `hipother <https://github.com/ROCm/hipother>`_ 6.4.43483 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
92 `rocm-core <https://github.com/ROCm/rocm-core>`_ 6.4.2 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
93 `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ 20240607.5.7 20240607.5.7 20240607.4.05 20240607.1.4246 20240125.5.08 20240125.5.08 20240125.5.08 20240125.3.30 20231016.2.245 20231016.2.245
94
95 SYSTEM MGMT TOOLS .. _tools-support-compatibility-matrix-past-60:
96 :doc:`AMD SMI <amdsmi:index>` 25.5.1 25.4.2 25.3.0 24.7.1 24.7.1 24.7.1 24.7.1 24.6.3 24.6.3 24.6.3 24.6.2 24.5.1 24.5.1 24.5.1 24.4.1 23.4.2 23.4.2
97 :doc:`ROCm Data Center Tool <rdc:index>` 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0
98 :doc:`rocminfo <rocminfo:index>` 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
99 :doc:`ROCm SMI <rocm_smi_lib:index>` 7.5.0 7.5.0 7.5.0 7.4.0 7.4.0 7.4.0 7.4.0 7.3.0 7.3.0 7.3.0 7.3.0 7.2.0 7.2.0 7.0.0 7.0.0 6.0.2 6.0.0
100 :doc:`ROCm Validation Suite <rocmvalidationsuite:index>` 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.0.60204 1.0.60202 1.0.60201 1.0.60200 1.0.60105 1.0.60102 1.0.60101 1.0.60100 1.0.60002 1.0.60000
101
102 PERFORMANCE TOOLS
103 :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>` 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0
104 :doc:`ROCm Compute Profiler <rocprofiler-compute:index>` 3.1.1 3.1.0 3.1.0 3.0.0 3.0.0 3.0.0 3.0.0 2.0.1 2.0.1 2.0.1 2.0.1 N/A N/A N/A N/A N/A N/A
105 :doc:`ROCm Systems Profiler <rocprofiler-systems:index>` 1.0.2 1.0.1 1.0.0 0.1.2 0.1.1 0.1.0 0.1.0 1.11.2 1.11.2 1.11.2 1.11.2 N/A N/A N/A N/A N/A N/A
106 :doc:`ROCProfiler <rocprofiler:index>` 2.0.60402 2.0.60401 2.0.60400 2.0.60303 2.0.60302 2.0.60301 2.0.60300 2.0.60204 2.0.60202 2.0.60201 2.0.60200 2.0.60105 2.0.60102 2.0.60101 2.0.60100 2.0.60002 2.0.60000
107 :doc:`ROCprofiler-SDK <rocprofiler-sdk:index>` 0.6.0 0.6.0 0.6.0 0.5.0 0.5.0 0.5.0 0.5.0 0.4.0 0.4.0 0.4.0 0.4.0 N/A N/A N/A N/A N/A N/A
108 :doc:`ROCTracer <roctracer:index>` 4.1.60402 4.1.60401 4.1.60400 4.1.60303 4.1.60302 4.1.60301 4.1.60300 4.1.60204 4.1.60202 4.1.60201 4.1.60200 4.1.60105 4.1.60102 4.1.60101 4.1.60100 4.1.60002 4.1.60000
109
110 DEVELOPMENT TOOLS
111 :doc:`HIPIFY <hipify:index>` 19.0.0 19.0.0 19.0.0 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24455 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
112 :doc:`ROCm CMake <rocmcmakebuildtools:index>` 0.14.0 0.14.0 0.14.0 0.14.0 0.14.0 0.14.0 0.14.0 0.13.0 0.13.0 0.13.0 0.13.0 0.12.0 0.12.0 0.12.0 0.12.0 0.11.0 0.11.0
113 :doc:`ROCdbgapi <rocdbgapi:index>` 0.77.2 0.77.2 0.77.2 0.77.0 0.77.0 0.77.0 0.77.0 0.76.0 0.76.0 0.76.0 0.76.0 0.71.0 0.71.0 0.71.0 0.71.0 0.71.0 0.71.0
114 :doc:`ROCm Debugger (ROCgdb) <rocgdb:index>` 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 14.2.0 14.2.0 14.2.0 14.2.0 14.1.0 14.1.0 14.1.0 14.1.0 13.2.0 13.2.0
115 `rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_ 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.3.0 0.3.0 0.3.0 0.3.0 N/A N/A
116 :doc:`ROCr Debug Agent <rocr_debug_agent:index>` 2.0.4 2.0.4 2.0.4 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3
117
118 COMPILERS .. _compilers-support-compatibility-matrix-past-60:
119 `clang-ocl <https://github.com/ROCm/clang-ocl>`_ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.5.0 0.5.0 0.5.0 0.5.0 0.5.0 0.5.0
120 :doc:`hipCC <hipcc:index>` 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
121 `Flang <https://github.com/ROCm/flang>`_ 19.0.0.25224 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24455 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
122 :doc:`llvm-project <llvm-project:index>` 19.0.0.25224 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24491 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
123 `OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_ 19.0.0.25224 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24491 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
124
125 RUNTIMES .. _runtime-support-compatibility-matrix-past-60:
126 :doc:`AMD CLR <hip:understand/amd_clr>` 6.4.43484 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
127 :doc:`HIP <hip:index>` 6.4.43484 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
128 `OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_ 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0
129 :doc:`ROCr Runtime <rocr-runtime:index>` 1.15.0 1.15.0 1.15.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.13.0 1.13.0 1.13.0 1.13.0 1.13.0 1.12.0 1.12.0

File diff suppressed because it is too large Load Diff

View File

@@ -1,246 +0,0 @@
.. meta::
:description: ROCm compatibility matrix
:keywords: GPU, architecture, hardware, compatibility, system, requirements, components, libraries
**************************************************************************************
Compatibility matrix
**************************************************************************************
Use this matrix to view the ROCm compatibility and system requirements across successive major and minor releases.
You can also refer to the :ref:`past versions of ROCm compatibility matrix<past-rocm-compatibility-matrix>`.
Accelerators and GPUs listed in the following table support compute workloads (no display
information or graphics). If youre using ROCm with AMD Radeon or Radeon Pro GPUs for graphics
workloads, see the `Use ROCm on Radeon GPU documentation
<https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html>`_ to verify
compatibility and system requirements.
.. |br| raw:: html
<br/>
.. container:: format-big-table
.. csv-table::
:header: "ROCm Version", "6.4.2", "6.4.1", "6.3.0"
:stub-columns: 1
:ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
,"RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4"
,RHEL 8.10,RHEL 8.10,RHEL 8.10
,"SLES 15 SP7, SP6",SLES 15 SP6,"SLES 15 SP6, SP5"
,"Oracle Linux 9, 8 [#mi300x]_","Oracle Linux 9, 8 [#mi300x]_",Oracle Linux 8.10 [#mi300x]_
,Debian 12 [#single-node]_,Debian 12 [#single-node]_,
,Azure Linux 3.0 [#mi300x]_,Azure Linux 3.0 [#mi300x]_,
,.. _architecture-support-compatibility-matrix:,,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA
,RDNA4,RDNA4,
,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix:,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS]_,gfx1201 [#RDNA-OS]_,
,gfx1200 [#RDNA-OS]_,gfx1200 [#RDNA-OS]_,
,gfx1101 [#RDNA-OS]_ [#7700XT-OS]_,gfx1101 [#RDNA-OS]_,
,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030
,gfx942,gfx942,gfx942
,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908
,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,85f95ae
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.17.3
,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.3.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.15.0,>=1.15.0
,,,
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix:,,
Thrust,2.5.0,2.5.0,2.3.2
CUB,2.5.0,2.5.0,2.3.2
,,,
KMD & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.12.0,2.12.0,2.11.0
:doc:`MIOpen <miopen:index>`,3.4.0,3.4.0,3.3.0
:doc:`MIVisionX <mivisionx:index>`,3.2.0,3.2.0,3.1.0
:doc:`rocAL <rocal:index>`,2.2.0,2.2.0,2.1.0
:doc:`rocDecode <rocdecode:index>`,0.10.0,0.10.0,0.8.0
:doc:`rocJPEG <rocjpeg:index>`,0.8.0,0.8.0,0.6.0
:doc:`rocPyDecode <rocpydecode:index>`,0.3.1,0.3.1,0.2.0
:doc:`RPP <rpp:index>`,1.9.10,1.9.10,1.9.1
,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
:doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.21.5
:doc:`rocSHMEM <rocshmem:index>`,2.0.1,2.0.0,N/A
,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.4.0,2.4.0,2.3.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.1,0.10.0
:doc:`hipFFT <hipfft:index>`,1.0.18,1.0.18,1.0.17
:doc:`hipfort <hipfort:index>`,0.6.0,0.6.0,0.5.0
:doc:`hipRAND <hiprand:index>`,2.12.0,2.12.0,2.11.0
:doc:`hipSOLVER <hipsolver:index>`,2.4.0,2.4.0,2.3.0
:doc:`hipSPARSE <hipsparse:index>`,3.2.0,3.2.0,3.1.2
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.3,0.2.3,0.2.2
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.3,3.2.1
:doc:`rocBLAS <rocblas:index>`,4.4.1,4.4.0,4.3.0
:doc:`rocFFT <rocfft:index>`,1.0.32,1.0.32,1.0.31
:doc:`rocRAND <rocrand:index>`,3.3.0,3.3.0,3.2.0
:doc:`rocSOLVER <rocsolver:index>`,3.28.2,3.28.0,3.27.0
:doc:`rocSPARSE <rocsparse:index>`,3.4.0,3.4.0,3.3.0
:doc:`rocWMMA <rocwmma:index>`,1.7.0,1.7.0,1.6.0
:doc:`Tensile <tensile:src/index>`,4.43.0,4.43.0,4.42.0
,,,
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
:doc:`hipCUB <hipcub:index>`,3.4.0,3.4.0,3.3.0
:doc:`hipTensor <hiptensor:index>`,1.5.0,1.5.0,1.4.0
:doc:`rocPRIM <rocprim:index>`,3.4.1,3.4.0,3.3.0
:doc:`rocThrust <rocthrust:index>`,3.3.0,3.3.0,3.3.0
,,,
SUPPORT LIBS,,,
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43483,6.3.42131
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.2,6.4.1,6.3.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
,,,
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix:,,
:doc:`AMD SMI <amdsmi:index>`,25.5.1,25.4.2,24.7.1
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.5.0,7.4.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0
,,,
PERFORMANCE TOOLS,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.1,3.1.0,3.0.0
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.2,1.0.1,0.1.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60402,2.0.60401,2.0.60300
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.6.0,0.5.0
:doc:`ROCTracer <roctracer:index>`,4.1.60402,4.1.60401,4.1.60300
,,,
DEVELOPMENT TOOLS,,,
:doc:`HIPIFY <hipify:index>`,19.0.0,19.0.0,18.0.0.24455
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.2,0.77.2,0.77.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,15.2.0,15.2.0,15.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.4.0,0.4.0
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.4,2.0.4,2.0.3
,,,
COMPILERS,.. _compilers-support-compatibility-matrix:,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25224,19.0.0.25184,18.0.0.24455
:doc:`llvm-project <llvm-project:index>`,19.0.0.25224,19.0.0.25184,18.0.0.24491
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25224,19.0.0.25184,18.0.0.24491
,,,
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43484,6.4.43483,6.3.42131
:doc:`HIP <hip:index>`,6.4.43484,6.4.43483,6.3.42131
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
:doc:`ROCr Runtime <rocr-runtime:index>`,1.15.0,1.15.0,1.14.0
.. rubric:: Footnotes
.. [#mi300x] Oracle Linux and Azure Linux are supported only on AMD Instinct MI300X.
.. [#single-node] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#RDNA-OS] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
.. [#7700XT-OS] Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6.
.. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
.. _OS-kernel-versions:
Operating systems, kernel and Glibc versions
*********************************************
Use this lookup table to confirm which operating system and kernel versions are supported with ROCm.
.. csv-table::
:header: "OS", "Version", "Kernel", "Glibc"
:widths: 40, 20, 30, 20
:stub-columns: 1
`Ubuntu <https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle>`_, 24.04.2, "6.8 GA, 6.11 HWE", 2.39
,,
`Ubuntu <https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle>`_, 22.04.5, "5.15 GA, 6.8 HWE", 2.35
,,
`Red Hat Enterprise Linux (RHEL 9) <https://access.redhat.com/articles/3078#RHEL9>`_, 9.6, 5.14+, 2.34
,9.5, 5.14+, 2.34
,9.4, 5.14+, 2.34
,9.3, 5.14+, 2.34
,,
`Red Hat Enterprise Linux (RHEL 8) <https://access.redhat.com/articles/3078#RHEL8>`_, 8.10, 4.18.0+, 2.28
,8.9, 4.18.0, 2.28
,,
`SUSE Linux Enterprise Server (SLES) <https://www.suse.com/support/kb/doc/?id=000019587#SLE15SP4>`_, 15 SP7, 6.11.0+, 2.38
,15 SP6, "6.5.0+, 6.4.0", 2.38
,15 SP5, 5.14.21, 2.31
,,
`Oracle Linux <https://blogs.oracle.com/scoter/post/oracle-linux-and-unbreakable-enterprise-kernel-uek-releases>`_, 9, 5.15.0 (UEK), 2.35
,8, 5.15.0 (UEK), 2.28
,,
`Debian <https://www.debian.org/download>`_,12, 6.1, 2.36
,,
`Azure Linux <https://techcommunity.microsoft.com/blog/linuxandopensourceblog/azure-linux-3-0-now-in-preview-on-azure-kubernetes-service-v1-31/4287229>`_,3.0, 6.6.60, 2.38
,,
.. note::
* See `Red Hat Enterprise Linux Release Dates <https://access.redhat.com/articles/3078>`_ to learn about the specific kernel versions supported on Red Hat Enterprise Linux (RHEL).
* See `List of SUSE Linux Enterprise Server kernel <https://www.suse.com/support/kb/doc/?id=000019587>`_ to learn about the specific kernel version supported on SUSE Linux Enterprise Server (SLES).
..
Footnotes and ref anchors in below historical tables should be appended with "-past-60", to differentiate from the
footnote references in the above, latest, compatibility matrix. It also allows to easily find & replace.
An easy way to work is to download the historical.CSV file, and update open it in excel. Then when content is ready,
delete the columns you don't need, to build the current compatibility matrix to use in above table. Find & replace all
instances of "-past-60" to make it ready for above table.
.. _past-rocm-compatibility-matrix:
Past versions of ROCm compatibility matrix
***************************************************
Expand for full historical view of:
.. dropdown:: ROCm 6.0 - Present
You can `download the entire .csv <../downloads/compatibility-matrix-historical-6.0.csv>`_ for offline reference.
.. csv-table::
:file: compatibility-matrix-historical-6.0.csv
:header-rows: 1
:stub-columns: 1
.. rubric:: Footnotes
.. [#mi300x-past-60] Oracle Linux and Azure Linux are supported only on AMD Instinct MI300X.
.. [#single-node-past-60] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#RDNA-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
.. [#7700XT-OS-past-60] Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6.
.. [#mi300_624-past-60] **For ROCm 6.2.4** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_622-past-60] **For ROCm 6.2.2** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_621-past-60] **For ROCm 6.2.1** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_620-past-60] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_612-past-60] **For ROCm 6.1.2** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4 and Oracle Linux.
.. [#mi300_611-past-60] **For ROCm 6.1.1** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4 and Oracle Linux.
.. [#mi300_610-past-60] **For ROCm 6.1.0** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_602-past-60] **For ROCm 6.0.2** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#verl_compat] verl is only supported on ROCm 6.2.0.
.. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.

View File

@@ -1,255 +0,0 @@
:orphan:
.. meta::
:description: Deep Graph Library (DGL) compatibility
:keywords: GPU, DGL compatibility
.. version-set:: rocm_version latest
********************************************************************************
DGL compatibility
********************************************************************************
Deep Graph Library `(DGL) <https://www.dgl.ai/>`_ is an easy-to-use, high-performance and scalable
Python package for deep learning on graphs. DGL is framework agnostic, meaning
if a deep graph model is a component in an end-to-end application, the rest of
the logic is implemented using PyTorch.
* ROCm support for DGL is hosted in the `https://github.com/ROCm/dgl <https://github.com/ROCm/dgl>`_ repository.
* Due to independent compatibility considerations, this location differs from the `https://github.com/dmlc/dgl <https://github.com/dmlc/dgl>`_ upstream repository.
* Use the prebuilt :ref:`Docker images <dgl-docker-compat>` with DGL, PyTorch, and ROCm preinstalled.
* See the :doc:`ROCm DGL installation guide <rocm-install-on-linux:install/3rd-party/dgl-install>`
to install and get started.
Supported devices
================================================================================
- **Officially Supported**: TF32 with AMD Instinct MI300X (through hipblaslt)
- **Partially Supported**: TF32 with AMD Instinct MI250X
.. _dgl-recommendations:
Use cases and recommendations
================================================================================
DGL can be used for Graph Learning, and building popular graph models like
GAT, GCN and GraphSage. Using these we can support a variety of use-cases such as:
- Recommender systems
- Network Optimization and Analysis
- 1D (Temporal) and 2D (Image) Classification
- Drug Discovery
Multiple use cases of DGL have been tested and verified.
However, a recommended example follows a drug discovery pipeline using the ``SE3Transformer``.
Refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`_,
where you can search for DGL examples and best practices to optimize your training workflows on AMD GPUs.
Coverage includes:
- Single-GPU training/inference
- Multi-GPU training
.. _dgl-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `DGL images <https://hub.docker.com/r/rocm/dgl>`_
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
inventories were tested on `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_.
Click the |docker-icon| to view the image on Docker Hub.
.. list-table:: DGL Docker image components
:header-rows: 1
:class: docker-image-compatibility
* - Docker
- DGL
- PyTorch
- Ubuntu
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-8ce2c3bcfaa137ab94a75f9e2ea711894748980f57417739138402a542dd5564"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-cf1683283b8eeda867b690229c8091c5bbf1edb9f52e8fb3da437c49a612ebe4"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-4834f178c3614e2d09e89e32041db8984c456d45dfd20286e377ca8635686554"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-88740a2c8ab4084b42b10c3c6ba984cab33dd3a044f479c6d7618e2b2cb05e69"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
- 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
Key ROCm libraries for DGL
================================================================================
DGL on ROCm depends on specific libraries that affect its features and performance.
Using the DGL Docker container or building it with the provided docker file or a ROCm base image is recommended.
If you prefer to build it yourself, ensure the following dependencies are installed:
.. list-table::
:header-rows: 1
* - ROCm library
- Version
- Purpose
* - `Composable Kernel <https://github.com/ROCm/composable_kernel>`_
- :version-ref:`"Composable Kernel" rocm_version`
- Enables faster execution of core operations like matrix multiplication
(GEMM), convolutions and transformations.
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
- :version-ref:`hipBLASLt rocm_version`
- hipBLASLt is an extension of the hipBLAS library, providing additional
features like epilogues fused into the matrix multiplication kernel or
use of integer tensor cores.
* - `hipCUB <https://github.com/ROCm/hipCUB>`_
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
* - `hipFFT <https://github.com/ROCm/hipFFT>`_
- :version-ref:`hipFFT rocm_version`
- Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
* - `hipRAND <https://github.com/ROCm/hipRAND>`_
- :version-ref:`hipRAND rocm_version`
- Provides fast random number generation for GPUs.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated solvers for linear systems, eigenvalues, and
singular value decompositions (SVD).
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
- :version-ref:`hipSPARSE rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
- :version-ref:`hipSPARSELt rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `hipTensor <https://github.com/ROCm/hipTensor>`_
- :version-ref:`hipTensor rocm_version`
- Optimizes for high-performance tensor operations, such as contractions.
* - `MIOpen <https://github.com/ROCm/MIOpen>`_
- :version-ref:`MIOpen rocm_version`
- Optimizes deep learning primitives such as convolutions, pooling,
normalization, and activation functions.
* - `MIGraphX <https://github.com/ROCm/AMDMIGraphX>`_
- :version-ref:`MIGraphX rocm_version`
- Adds graph-level optimizations, ONNX models and mixed precision support
and enable Ahead-of-Time (AOT) Compilation.
* - `MIVisionX <https://github.com/ROCm/MIVisionX>`_
- :version-ref:`MIVisionX rocm_version`
- Optimizes acceleration for computer vision and AI workloads like
preprocessing, augmentation, and inferencing.
* - `rocAL <https://github.com/ROCm/rocAL>`_
- :version-ref:`rocAL rocm_version`
- Accelerates the data pipeline by offloading intensive preprocessing and
augmentation tasks. rocAL is part of MIVisionX.
* - `RCCL <https://github.com/ROCm/rccl>`_
- :version-ref:`RCCL rocm_version`
- Optimizes for multi-GPU communication for operations like AllReduce and
Broadcast.
* - `rocDecode <https://github.com/ROCm/rocDecode>`_
- :version-ref:`rocDecode rocm_version`
- Provides hardware-accelerated data decoding capabilities, particularly
for image, video, and other dataset formats.
* - `rocJPEG <https://github.com/ROCm/rocJPEG>`_
- :version-ref:`rocJPEG rocm_version`
- Provides hardware-accelerated JPEG image decoding and encoding.
* - `RPP <https://github.com/ROCm/RPP>`_
- :version-ref:`RPP rocm_version`
- Speeds up data augmentation, transformation, and other preprocessing steps.
* - `rocThrust <https://github.com/ROCm/rocThrust>`_
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
* - `rocWMMA <https://github.com/ROCm/rocWMMA>`_
- :version-ref:`rocWMMA rocm_version`
- Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix
multiplication (GEMM) and accumulation operations with mixed precision
support.
Supported features
================================================================================
Many functions and methods available in DGL Upstream are also supported in DGL ROCm.
Instead of listing them all, support is grouped into the following categories to provide a general overview.
* DGL Base
* DGL Backend
* DGL Data
* DGL Dataloading
* DGL DGLGraph
* DGL Function
* DGL Ops
* DGL Sampling
* DGL Transforms
* DGL Utils
* DGL Distributed
* DGL Geometry
* DGL Mpops
* DGL NN
* DGL Optim
* DGL Sparse
Unsupported features
================================================================================
* Graphbolt
* Partial TF32 Support (MI250x only)
* Kineto/ ROCTracer integration
Unsupported functions
================================================================================
* ``more_nnz``
* ``format``
* ``multiprocess_sparse_adam_state_dict``
* ``record_stream_ndarray``
* ``half_spmm``
* ``segment_mm``
* ``gather_mm_idx_b``
* ``pgexplainer``
* ``sample_labors_prob``
* ``sample_labors_noprob``

View File

@@ -1,314 +0,0 @@
:orphan:
.. meta::
:description: JAX compatibility
:keywords: GPU, JAX compatibility
.. version-set:: rocm_version latest
*******************************************************************************
JAX compatibility
*******************************************************************************
JAX provides a NumPy-like API, which combines automatic differentiation and the
Accelerated Linear Algebra (XLA) compiler to achieve high-performance machine
learning at scale.
JAX uses composable transformations of Python and NumPy through just-in-time
(JIT) compilation, automatic vectorization, and parallelization. To learn about
JAX, including profiling and optimizations, see the official `JAX documentation
<https://jax.readthedocs.io/en/latest/notebooks/quickstart.html>`_.
ROCm support for JAX is upstreamed, and users can build the official source code
with ROCm support:
- ROCm JAX release:
- Offers AMD-validated and community :ref:`Docker images <jax-docker-compat>`
with ROCm and JAX preinstalled.
- ROCm JAX repository: `ROCm/jax <https://github.com/ROCm/jax>`_
- See the :doc:`ROCm JAX installation guide <rocm-install-on-linux:install/3rd-party/jax-install>`
to get started.
- Official JAX release:
- Official JAX repository: `jax-ml/jax <https://github.com/jax-ml/jax>`_
- See the `AMD GPU (Linux) installation section
<https://jax.readthedocs.io/en/latest/installation.html#amd-gpu-linux>`_ in
the JAX documentation.
.. note::
AMD releases official `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
quarterly alongside new ROCm releases. These images undergo full AMD testing.
`Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
follow upstream JAX releases and use the latest available ROCm version.
Use cases and recommendations
================================================================================
* The `nanoGPT in JAX <https://rocm.blogs.amd.com/artificial-intelligence/nanoGPT-JAX/README.html>`_
blog explores the implementation and training of a Generative Pre-trained
Transformer (GPT) model in JAX, inspired by Andrej Karpathys JAX-based
nanoGPT. Comparing how essential GPT components—such as self-attention
mechanisms and optimizers—are realized in JAX and JAX, also highlights
JAXs unique features.
* The `Optimize GPT Training: Enabling Mixed Precision Training in JAX using
ROCm on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/jax-mixed-precision/README.html>`_
blog post provides a comprehensive guide on enhancing the training efficiency
of GPT models by implementing mixed precision techniques in JAX, specifically
tailored for AMD GPUs utilizing the ROCm platform.
* The `Supercharging JAX with Triton Kernels on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/jax-triton/README.html>`_
blog demonstrates how to develop a custom fused dropout-activation kernel for
matrices using Triton, integrate it with JAX, and benchmark its performance
using ROCm.
* The `Distributed fine-tuning with JAX on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/distributed-sft-jax/README.html>`_
outlines the process of fine-tuning a Bidirectional Encoder Representations
from Transformers (BERT)-based large language model (LLM) using JAX for a text
classification task. The blog post discuss techniques for parallelizing the
fine-tuning across multiple AMD GPUs and assess the model's performance on a
holdout dataset. During the fine-tuning, a BERT-base-cased transformer model
and the General Language Understanding Evaluation (GLUE) benchmark dataset was
used on a multi-GPU setup.
* The `MI300X workload optimization guide <https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html>`_
provides detailed guidance on optimizing workloads for the AMD Instinct MI300X
accelerator using ROCm. The page is aimed at helping users achieve optimal
performance for deep learning and other high-performance computing tasks on
the MI300X GPU.
For more use cases and recommendations, see `ROCm JAX blog posts <https://rocm.blogs.amd.com/blog/tag/jax.html>`_.
.. _jax-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes ready-made `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
with ROCm backends on Docker Hub. The following Docker image tags and
associated inventories represent the latest JAX version from the official Docker Hub and are validated for
`ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`_. Click the |docker-icon|
icon to view the image on Docker Hub.
.. list-table:: JAX Docker image components
:header-rows: 1
* - Docker image
- JAX
- Linux
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.1-jax0.4.35-py3.12/images/sha256-7a0745a2a2758bdf86397750bac00e9086cbf67d170cfdbb08af73f7c7d18a6a"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
- `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
- Ubuntu 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.1-jax0.4.35-py3.10/images/sha256-5f9e8d6e6e69fdc9a1a3f2ba3b1234c3f46c53b7468538c07fd18b00899da54f"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
- `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
- Ubuntu 22.04
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
AMD publishes `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
with ROCm backends on Docker Hub. The following Docker image tags and
associated inventories are tested for `ROCm 6.3.2 <https://repo.radeon.com/rocm/apt/6.3.2/>`_.
.. list-table:: JAX community Docker image components
:header-rows: 1
* - Docker image
- JAX
- Linux
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.12.8/images/sha256-25dfaa0183e274bd0a3554a309af3249c6f16a1793226cb5373f418e39d3146a"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
- Ubuntu 22.04
- `3.12.8 <https://www.python.org/downloads/release/python-3128/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.11.11/images/sha256-ff9baeca9067d13e6c279c911e5a9e5beed0817d24fafd424367cc3d5bd381d7"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
- Ubuntu 22.04
- `3.11.11 <https://www.python.org/downloads/release/python-31111/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.10.16/images/sha256-8bab484be1713655f74da51a191ed824bb9d03db1104fd63530a1ac3c37cf7b1"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
- Ubuntu 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
.. _key_rocm_libraries:
Key ROCm libraries for JAX
================================================================================
The following ROCm libraries represent potential targets that could be utilized
by JAX on ROCm for various computational tasks. The actual libraries used will
depend on the specific implementation and operations performed.
.. list-table::
:header-rows: 1
* - ROCm library
- Version
- Purpose
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
- :version-ref:`hipBLASLt rocm_version`
- hipBLASLt is an extension of hipBLAS, providing additional
features like epilogues fused into the matrix multiplication kernel or
use of integer tensor cores.
* - `hipCUB <https://github.com/ROCm/hipCUB>`_
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
* - `hipFFT <https://github.com/ROCm/hipFFT>`_
- :version-ref:`hipFFT rocm_version`
- Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
* - `hipRAND <https://github.com/ROCm/hipRAND>`_
- :version-ref:`hipRAND rocm_version`
- Provides fast random number generation for GPUs.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated solvers for linear systems, eigenvalues, and
singular value decompositions (SVD).
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
- :version-ref:`hipSPARSE rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
- :version-ref:`hipSPARSELt rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `MIOpen <https://github.com/ROCm/MIOpen>`_
- :version-ref:`MIOpen rocm_version`
- Optimized for deep learning primitives such as convolutions, pooling,
normalization, and activation functions.
* - `RCCL <https://github.com/ROCm/rccl>`_
- :version-ref:`RCCL rocm_version`
- Optimized for multi-GPU communication for operations like all-reduce,
broadcast, and scatter.
* - `rocThrust <https://github.com/ROCm/rocThrust>`_
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
.. note::
This table shows ROCm libraries that could potentially be utilized by JAX. Not
all libraries may be used in every configuration, and the actual library usage
will depend on the specific operations and implementation details.
Supported data types and modules
===============================================================================
The following tables lists the supported public JAX API data types and modules.
Supported data types
--------------------------------------------------------------------------------
ROCm supports all the JAX data types of `jax.dtypes <https://docs.jax.dev/en/latest/jax.dtypes.html>`_
module, `jax.numpy.dtype <https://docs.jax.dev/en/latest/_autosummary/jax.numpy.dtype.html>`_
and `default_dtype <https://docs.jax.dev/en/latest/default_dtypes.html>`_ .
The ROCm supported data types in JAX are collected in the following table.
.. list-table::
:header-rows: 1
* - Data type
- Description
* - ``bfloat16``
- 16-bit bfloat (brain floating point).
* - ``bool``
- Boolean.
* - ``complex128``
- 128-bit complex.
* - ``complex64``
- 64-bit complex.
* - ``float16``
- 16-bit (half precision) floating-point.
* - ``float32``
- 32-bit (single precision) floating-point.
* - ``float64``
- 64-bit (double precision) floating-point.
* - ``half``
- 16-bit (half precision) floating-point.
* - ``int16``
- Signed 16-bit integer.
* - ``int32``
- Signed 32-bit integer.
* - ``int64``
- Signed 64-bit integer.
* - ``int8``
- Signed 8-bit integer.
* - ``uint16``
- Unsigned 16-bit (word) integer.
* - ``uint32``
- Unsigned 32-bit (dword) integer.
* - ``uint64``
- Unsigned 64-bit (qword) integer.
* - ``uint8``
- Unsigned 8-bit (byte) integer.
.. note::
JAX data type support is effected by the :ref:`key_rocm_libraries` and it's
collected on :doc:`ROCm data types and precision support <rocm:reference/precision-support>`
page.
Supported modules
--------------------------------------------------------------------------------
For a complete and up-to-date list of JAX public modules (for example, ``jax.numpy``,
``jax.scipy``, ``jax.lax``), their descriptions, and usage, please refer directly to the
`official JAX API documentation <https://jax.readthedocs.io/en/latest/jax.html>`_.
.. note::
Since version 0.1.56, JAX has full support for ROCm, and the
:ref:`Known issues and important notes <jax_comp_known_issues>` section
contains details about limitations specific to the ROCm backend. The list of
JAX API modules is maintained by the JAX project and is subject to change.
Refer to the official Jax documentation for the most up-to-date information.

View File

@@ -1,544 +0,0 @@
:orphan:
.. meta::
:description: PyTorch compatibility
:keywords: GPU, PyTorch compatibility
.. version-set:: rocm_version latest
********************************************************************************
PyTorch compatibility
********************************************************************************
`PyTorch <https://pytorch.org/>`__ is an open-source tensor library designed for
deep learning. PyTorch on ROCm provides mixed-precision and large-scale training
using `MIOpen <https://github.com/ROCm/MIOpen>`__ and
`RCCL <https://github.com/ROCm/rccl>`__ libraries.
ROCm support for PyTorch is upstreamed into the official PyTorch repository. Due
to independent compatibility considerations, this results in two distinct
release cycles for PyTorch on ROCm:
- ROCm PyTorch release:
- Provides the latest version of ROCm but might not necessarily support the
latest stable PyTorch version.
- Offers :ref:`Docker images <pytorch-docker-compat>` with ROCm and PyTorch
preinstalled.
- ROCm PyTorch repository: `<https://github.com/ROCm/pytorch>`__
- See the :doc:`ROCm PyTorch installation guide <rocm-install-on-linux:install/3rd-party/pytorch-install>`
to get started.
- Official PyTorch release:
- Provides the latest stable version of PyTorch but might not necessarily
support the latest ROCm version.
- Official PyTorch repository: `<https://github.com/pytorch/pytorch>`__
- See the `Nightly and latest stable version installation guide <https://pytorch.org/get-started/locally/>`__
or `Previous versions <https://pytorch.org/get-started/previous-versions/>`__
to get started.
PyTorch includes tooling that generates HIP source code from the CUDA backend.
This approach allows PyTorch to support ROCm without requiring manual code
modifications. For more information, see :doc:`HIPIFY <hipify:index>`.
ROCm development is aligned with the stable release of PyTorch, while upstream
PyTorch testing uses the stable release of ROCm to maintain consistency.
.. _pytorch-recommendations:
Use cases and recommendations
================================================================================
* :doc:`Using ROCm for AI: training a model </how-to/rocm-for-ai/training/benchmark-docker/pytorch-training>`
guides how to leverage the ROCm platform for training AI models. It covers the
steps, tools, and best practices for optimizing training workflows on AMD GPUs
using PyTorch features.
* :doc:`Single-GPU fine-tuning and inference </how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference>`
describes and demonstrates how to use the ROCm platform for the fine-tuning
and inference of machine learning models, particularly large language models
(LLMs), on systems with a single GPU. This topic provides a detailed guide for
setting up, optimizing, and executing fine-tuning and inference workflows in
such environments.
* :doc:`Multi-GPU fine-tuning and inference optimization </how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference>`
describes and demonstrates the fine-tuning and inference of machine learning
models on systems with multiple GPUs.
* The :doc:`Instinct MI300X workload optimization guide </how-to/rocm-for-ai/inference-optimization/workload>`
provides detailed guidance on optimizing workloads for the AMD Instinct MI300X
accelerator using ROCm. This guide helps users achieve optimal performance for
deep learning and other high-performance computing tasks on the MI300X
accelerator.
* The :doc:`Inception with PyTorch documentation </conceptual/ai-pytorch-inception>`
describes how PyTorch integrates with ROCm for AI workloads It outlines the
use of PyTorch on the ROCm platform and focuses on efficiently leveraging AMD
GPU hardware for training and inference tasks in AI applications.
For more use cases and recommendations, see `ROCm PyTorch blog posts <https://rocm.blogs.amd.com/blog/tag/pytorch.html>`__.
.. _pytorch-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `PyTorch images <https://hub.docker.com/r/rocm/pytorch>`__
with ROCm backends on Docker Hub. The following Docker image tags and associated
inventories were tested on `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__.
Click |docker-icon| to view the image on Docker Hub.
.. list-table:: PyTorch Docker image components
:header-rows: 1
:class: docker-image-compatibility
* - Docker
- PyTorch
- Ubuntu
- Python
- Apex
- torchvision
- TensorBoard
- MAGMA
- UCX
- OMPI
- OFED
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-c76af9bfb1c25b0f40d4c29e8652105c57250bf018d23ff595b06bd79666fdd7"><i class="fab fa-docker fa-lg"></i></a>
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`__
- 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`__
- `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.16.0 <https://github.com/openucx/ucx/tree/v1.16.0>`__
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.6.0/images/sha256-f9d226135d51831c810dcb1251636ec61f85c65fcdda03e188c053a5d4f6585b"><i class="fab fa-docker fa-lg"></i></a>
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`__
- 22.04
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`__
- `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.5.1/images/sha256-3490e74d4f43dcdb3351dd334108d1ccd47e5a687c0523a2424ac1bcdd3dd6dd"><i class="fab fa-docker fa-lg"></i></a>
- `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
- 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`__
- `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.10.0>`__
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.5.1/images/sha256-26c5dfffb4a54625884abca83166940f17dd27bc75f1b24f6e80fbcb7d4e9afb"><i class="fab fa-docker fa-lg"></i></a>
- `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
- 22.04
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`__
- `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-f378a24561fa6efc178b6dc93fc7d82e5b93653ecd59c89d4476674d29e1284d"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`__
- 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`__
- `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`__
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-2308dbd0e650b7bf8d548575cbb6e2bdc021f9386384ce570da16d58ee684d22"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`__
- 22.04
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`__
- `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.3.0/images/sha256-eefd2ab019728f91f94c5e6a9463cb0ea900b3011458d18fe5d88e50c0b57d86"><i class="fab fa-docker fa-lg"></i></a>
- `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`__
- 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`__
- `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`__
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-473643226ab0e93a04720b256ed772619878abf9c42b9f84828cefed522696fd"><i class="fab fa-docker fa-lg"></i></a>
- `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`__
- 22.04
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`__
- `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`__
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`__
- `master <https://bitbucket.org/icl/magma/src/master/>`__
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
- `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`__
Key ROCm libraries for PyTorch
================================================================================
PyTorch functionality on ROCm is determined by its underlying library
dependencies. These ROCm components affect the capabilities, performance, and
feature set available to developers.
.. list-table::
:header-rows: 1
* - ROCm library
- Version
- Purpose
- Used in
* - `Composable Kernel <https://github.com/ROCm/composable_kernel>`__
- :version-ref:`"Composable Kernel" rocm_version`
- Enables faster execution of core operations like matrix multiplication
(GEMM), convolutions and transformations.
- Speeds up ``torch.permute``, ``torch.view``, ``torch.matmul``,
``torch.mm``, ``torch.bmm``, ``torch.nn.Conv2d``, ``torch.nn.Conv3d``
and ``torch.nn.MultiheadAttention``.
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`__
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
- Supports operations such as matrix multiplication, matrix-vector
products, and tensor contractions. Utilized in both dense and batched
linear algebra operations.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`__
- :version-ref:`hipBLASLt rocm_version`
- hipBLASLt is an extension of the hipBLAS library, providing additional
features like epilogues fused into the matrix multiplication kernel or
use of integer tensor cores.
- Accelerates operations such as ``torch.matmul``, ``torch.mm``, and the
matrix multiplications used in convolutional and linear layers.
* - `hipCUB <https://github.com/ROCm/hipCUB>`__
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
- Supports operations such as ``torch.sum``, ``torch.cumsum``,
``torch.sort`` irregular shapes often involve scanning, sorting, and
filtering, which hipCUB handles efficiently.
* - `hipFFT <https://github.com/ROCm/hipFFT>`__
- :version-ref:`hipFFT rocm_version`
- Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
- Used in functions like the ``torch.fft`` module.
* - `hipRAND <https://github.com/ROCm/hipRAND>`__
- :version-ref:`hipRAND rocm_version`
- Provides fast random number generation for GPUs.
- The ``torch.rand``, ``torch.randn``, and stochastic layers like
``torch.nn.Dropout`` rely on hipRAND.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`__
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated solvers for linear systems, eigenvalues, and
singular value decompositions (SVD).
- Supports functions like ``torch.linalg.solve``,
``torch.linalg.eig``, and ``torch.linalg.svd``.
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`__
- :version-ref:`hipSPARSE rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
- Sparse tensor operations ``torch.sparse``.
* - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`__
- :version-ref:`hipSPARSELt rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
- Sparse tensor operations ``torch.sparse``.
* - `hipTensor <https://github.com/ROCm/hipTensor>`__
- :version-ref:`hipTensor rocm_version`
- Optimizes for high-performance tensor operations, such as contractions.
- Accelerates tensor algebra, especially in deep learning and scientific
computing.
* - `MIOpen <https://github.com/ROCm/MIOpen>`__
- :version-ref:`MIOpen rocm_version`
- Optimizes deep learning primitives such as convolutions, pooling,
normalization, and activation functions.
- Speeds up convolutional neural networks (CNNs), recurrent neural
networks (RNNs), and other layers. Used in operations like
``torch.nn.Conv2d``, ``torch.nn.ReLU``, and ``torch.nn.LSTM``.
* - `MIGraphX <https://github.com/ROCm/AMDMIGraphX>`__
- :version-ref:`MIGraphX rocm_version`
- Adds graph-level optimizations, ONNX models and mixed precision support
and enable Ahead-of-Time (AOT) Compilation.
- Speeds up inference models and executes ONNX models for
compatibility with other frameworks.
``torch.nn.Conv2d``, ``torch.nn.ReLU``, and ``torch.nn.LSTM``.
* - `MIVisionX <https://github.com/ROCm/MIVisionX>`__
- :version-ref:`MIVisionX rocm_version`
- Optimizes acceleration for computer vision and AI workloads like
preprocessing, augmentation, and inferencing.
- Faster data preprocessing and augmentation pipelines for datasets like
ImageNet or COCO and easy to integrate into PyTorch's ``torch.utils.data``
and ``torchvision`` workflows.
* - `rocAL <https://github.com/ROCm/rocAL>`__
- :version-ref:`rocAL rocm_version`
- Accelerates the data pipeline by offloading intensive preprocessing and
augmentation tasks. rocAL is part of MIVisionX.
- Easy to integrate into PyTorch's ``torch.utils.data`` and
``torchvision`` data load workloads.
* - `RCCL <https://github.com/ROCm/rccl>`__
- :version-ref:`RCCL rocm_version`
- Optimizes for multi-GPU communication for operations like AllReduce and
Broadcast.
- Distributed data parallel training (``torch.nn.parallel.DistributedDataParallel``).
Handles communication in multi-GPU setups.
* - `rocDecode <https://github.com/ROCm/rocDecode>`__
- :version-ref:`rocDecode rocm_version`
- Provides hardware-accelerated data decoding capabilities, particularly
for image, video, and other dataset formats.
- Can be integrated in ``torch.utils.data``, ``torchvision.transforms``
and ``torch.distributed``.
* - `rocJPEG <https://github.com/ROCm/rocJPEG>`__
- :version-ref:`rocJPEG rocm_version`
- Provides hardware-accelerated JPEG image decoding and encoding.
- GPU accelerated ``torchvision.io.decode_jpeg`` and
``torchvision.io.encode_jpeg`` and can be integrated in
``torch.utils.data`` and ``torchvision``.
* - `RPP <https://github.com/ROCm/RPP>`__
- :version-ref:`RPP rocm_version`
- Speeds up data augmentation, transformation, and other preprocessing steps.
- Easy to integrate into PyTorch's ``torch.utils.data`` and
``torchvision`` data load workloads to speed up data processing.
* - `rocThrust <https://github.com/ROCm/rocThrust>`__
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
- Utilized in backend operations for tensor computations requiring
parallel processing.
* - `rocWMMA <https://github.com/ROCm/rocWMMA>`__
- :version-ref:`rocWMMA rocm_version`
- Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix
multiplication (GEMM) and accumulation operations with mixed precision
support.
- Linear layers (``torch.nn.Linear``), convolutional layers
(``torch.nn.Conv2d``), attention layers, general tensor operations that
involve matrix products, such as ``torch.matmul``, ``torch.bmm``, and
more.
Supported modules and data types
================================================================================
The following section outlines the supported data types, modules, and domain libraries available in PyTorch on ROCm.
Supported data types
--------------------------------------------------------------------------------
The tensor data type is specified using the ``dtype`` attribute or argument.
PyTorch supports many data types for different use cases.
The following table lists `torch.Tensor <https://pytorch.org/docs/stable/tensors.html>`__
single data types:
.. list-table::
:header-rows: 1
* - Data type
- Description
* - ``torch.float8_e4m3fn``
- 8-bit floating point, e4m3
* - ``torch.float8_e5m2``
- 8-bit floating point, e5m2
* - ``torch.float16`` or ``torch.half``
- 16-bit floating point
* - ``torch.bfloat16``
- 16-bit floating point
* - ``torch.float32`` or ``torch.float``
- 32-bit floating point
* - ``torch.float64`` or ``torch.double``
- 64-bit floating point
* - ``torch.complex32`` or ``torch.chalf``
- 32-bit complex numbers
* - ``torch.complex64`` or ``torch.cfloat``
- 64-bit complex numbers
* - ``torch.complex128`` or ``torch.cdouble``
- 128-bit complex numbers
* - ``torch.uint8``
- 8-bit integer (unsigned)
* - ``torch.uint16``
- 16-bit integer (unsigned);
Not natively supported in ROCm
* - ``torch.uint32``
- 32-bit integer (unsigned);
Not natively supported in ROCm
* - ``torch.uint64``
- 64-bit integer (unsigned);
Not natively supported in ROCm
* - ``torch.int8``
- 8-bit integer (signed)
* - ``torch.int16`` or ``torch.short``
- 16-bit integer (signed)
* - ``torch.int32`` or ``torch.int``
- 32-bit integer (signed)
* - ``torch.int64`` or ``torch.long``
- 64-bit integer (signed)
* - ``torch.bool``
- Boolean
* - ``torch.quint8``
- Quantized 8-bit integer (unsigned)
* - ``torch.qint8``
- Quantized 8-bit integer (signed)
* - ``torch.qint32``
- Quantized 32-bit integer (signed)
* - ``torch.quint4x2``
- Quantized 4-bit integer (unsigned)
.. note::
Unsigned types, except ``uint8``, have limited support in eager mode. They
primarily exist to assist usage with ``torch.compile``.
See :doc:`ROCm precision support <rocm:reference/precision-support>` for the
native hardware support of data types.
Supported modules
--------------------------------------------------------------------------------
For a complete and up-to-date list of PyTorch core modules (for example., ``torch``,
``torch.nn``, ``torch.cuda``, ``torch.backends.cuda`` and
``torch.backends.cudnn``), their descriptions, and usage, please refer directly
to the `official PyTorch documentation <https://pytorch.org/docs/stable/index.html>`_.
Core PyTorch functionality on ROCm includes tensor operations, neural network
layers, automatic differentiation, distributed training, mixed-precision
training, compilation features, and domain-specific libraries for audio, vision,
text processing, and more.
Supported domain libraries
--------------------------------------------------------------------------------
PyTorch offers specialized `domain libraries <https://pytorch.org/domains/>`_ with
GPU acceleration that build on its core features to support specific application
areas. The table below lists the PyTorch domain libraries that are compatible
with ROCm.
.. list-table::
:header-rows: 1
* - Library
- Description
* - `torchaudio <https://docs.pytorch.org/audio/stable/index.html>`_
- Audio and signal processing library for PyTorch. Provides utilities for
audio I/O, signal and data processing functions, datasets, model
implementations, and application components for audio and speech
processing tasks.
**Note:** To ensure GPU-acceleration with ``torchaudio.transforms``,
you need to explicitly move audio data (waveform tensor) to GPU using
``.to('cuda')``.
* - `torchtune <https://docs.pytorch.org/torchtune/stable/index.html>`_
- PyTorch-native library designed for fine-tuning large language models
(LLMs). Provides supports the full fine-tuning workflow and offers
compatibility with popular production inference systems.
**Note:** Only official release exists.
* - `torchvision <https://docs.pytorch.org/vision/stable/index.html>`_
- Computer vision library that is part of the PyTorch project. Provides
popular datasets, model architectures, and common image transformations
for computer vision applications.
* - `torchtext <https://docs.pytorch.org/text/stable/index.html>`_
- Text processing library for PyTorch. Provides data processing utilities
and popular datasets for natural language processing, including
tokenization, vocabulary management, and text embeddings.
**Note:** ``torchtext`` does not implement ROCm-specific kernels.
ROCm acceleration is provided through the underlying PyTorch framework
and ROCm library integration. Only official release exists.
* - `torchdata <https://docs.pytorch.org/data/beta/index.html>`_
- Beta library of common modular data loading primitives for easily
constructing flexible and performant data pipelines, with features still
in prototype stage.
* - `torchrec <https://docs.pytorch.org/torchrec/>`_
- PyTorch domain library for common sparsity and parallelism primitives
needed for large-scale recommender systems, enabling authors to train
models with large embedding tables shared across many GPUs.
**Note:** ``torchrec`` does not implement ROCm-specific kernels. ROCm
acceleration is provided through the underlying PyTorch framework and
ROCm library integration.
* - `torchserve <https://docs.pytorch.org/serve/>`_
- Performant, flexible and easy-to-use tool for serving PyTorch models in
production, providing features for model management, batch processing,
and scalable deployment.
**Note:** `torchserve <https://docs.pytorch.org/serve/>`_ is no longer
actively maintained. Last official release is sent out with PyTorch 2.4.
* - `torchrl <https://docs.pytorch.org/rl/stable/index.html>`_
- Open-source, Python-first Reinforcement Learning library for PyTorch
with a focus on high modularity and good runtime performance, providing
low and high-level RL abstractions and reusable functionals for cost
functions, returns, and data processing.
**Note:** Only official release exists.
* - `tensordict <https://docs.pytorch.org/tensordict/stable/index.html>`_
- Dictionary-like class that simplifies operations on batches of tensors,
enhancing code readability, compactness, and modularity by abstracting
tailored operations and reducing errors through automatic operation
dispatching.
**Note:** Only official release exists.

View File

@@ -1,100 +0,0 @@
:orphan:
.. meta::
:description: Stanford Megatron-LM compatibility
:keywords: Stanford, Megatron-LM, compatibility
.. version-set:: rocm_version latest
********************************************************************************
Stanford Megatron-LM compatibility
********************************************************************************
Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_. It is
designed to train massive transformer-based language models efficiently by model and data parallelism.
* ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM <https://github.com/ROCm/Stanford-Megatron-LM>`_ repository.
* Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`_ upstream repository.
* Use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>` with ROCm, PyTorch, and Megatron-LM preinstalled.
* See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>` to install and get started.
.. note::
Stanford Megatron-LM is supported on ROCm 6.3.0.
Supported Devices
================================================================================
- **Officially Supported**: AMD Instinct MI300X
- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X
Supported models and features
================================================================================
This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
Models:
* Bert
* GPT
* T5
* ICT
Features:
* Distributed Pre-training
* Activation Checkpointing and Recomputation
* Distributed Optimizer
* Mixture-of-Experts
.. _megatron-lm-recommendations:
Use cases and recommendations
================================================================================
See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog <https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`_ post
to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs.
Coverage includes:
* Single-GPU pre-training
* Multi-GPU pre-training
.. _megatron-lm-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/megatron-lm>`_
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
inventories represent the latest Megatron-LM version from the official Docker Hub.
The Docker images have been validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
Click |docker-icon| to view the image on Docker Hub.
.. list-table::
:header-rows: 1
:class: docker-image-compatibility
* - Docker image
- Stanford Megatron-LM
- PyTorch
- Ubuntu
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
- `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
- `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_

View File

@@ -1,571 +0,0 @@
:orphan:
.. meta::
:description: TensorFlow compatibility
:keywords: GPU, TensorFlow compatibility
.. version-set:: rocm_version latest
*******************************************************************************
TensorFlow compatibility
*******************************************************************************
`TensorFlow <https://www.tensorflow.org/>`__ is an open-source library for
solving machine learning, deep learning, and AI problems. It can solve many
problems across different sectors and industries but primarily focuses on
neural network training and inference. It is one of the most popular and
in-demand frameworks and is very active in open-source contribution and
development.
The `official TensorFlow repository <http://github.com/tensorflow/tensorflow>`__
includes full ROCm support. AMD maintains a TensorFlow `ROCm repository
<http://github.com/rocm/tensorflow-upstream>`__ in order to quickly add bug
fixes, updates, and support for the latest ROCM versions.
- ROCm TensorFlow release:
- Offers :ref:`Docker images <tensorflow-docker-compat>` with
ROCm and TensorFlow pre-installed.
- ROCm TensorFlow repository: `<https://github.com/ROCm/tensorflow-upstream>`__
- See the :doc:`ROCm TensorFlow installation guide <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
to get started.
- Official TensorFlow release:
- Official TensorFlow repository: `<https://github.com/tensorflow/tensorflow>`__
- See the `TensorFlow API versions <https://www.tensorflow.org/versions>`__ list.
.. note::
The official TensorFlow documentation does not cover ROCm support. Use the
ROCm documentation for installation instructions for Tensorflow on ROCm.
See :doc:`rocm-install-on-linux:install/3rd-party/tensorflow-install`.
.. _tensorflow-docker-compat:
Docker image compatibility
===============================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes ready-made `TensorFlow images
<https://hub.docker.com/r/rocm/tensorflow>`__ with ROCm backends on
Docker Hub. The following Docker image tags and associated inventories are
validated for `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__. Click
the |docker-icon| icon to view the image on Docker Hub.
.. list-table:: TensorFlow Docker image components
:header-rows: 1
* - Docker image
- TensorFlow
- Ubuntu
- Dev
- Python
- TensorBoard
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.18-dev/images/sha256-fa9cf5fa6c6079a7118727531ccd0056c6e3224a42c3d6e78a49e7781daafff4"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.18-runtime/images/sha256-d14d8c4989e7c9a60f4e72461b9e349de72347c6162dcd6897e6f4f80ffbb440"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.18-dev/images/sha256-081e5bd6615a5dc17247ebd2ccc26895c3feeff086720400fa39b477e60a77c0"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.18-runtime/images/sha256-bf369637378264f4af6ddad5ca8b8611d3e372ffbea9ab7a06f1e122f0a0867b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.17-dev/images/sha256-5a502008c50d0b6508e6027f911bdff070a7493700ae064bed74e1d22b91ed50"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.17-runtime/images/sha256-1ee5dfffceb71ac66617ada33de3a10de0cb74199cc4b82441192e5e92fa2ddf"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-3124/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.17-dev/images/sha256-109218ad92bfae83bbd2710475f7502166e1ed54ca0b9748a9cbc3f5a1d75af1"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.17-runtime/images/sha256-5d78bd5918d394f92263daa2990e88d695d27200dd90ed83ec64d20c7661c9c1"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.16-dev/images/sha256-b09b1ad921c09c687b7c916141051e9fcf15539a5686e5aa67c689195a522719"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.16-runtime/images/sha256-20dbd824e85558abfe33fc9283cc547d88cde3c623fe95322743a5082f883a64"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-dev/images/sha256-36c4fa047c86e2470ac473ec1429aea6d4b8934b90ffeb34d1afab40e7e5b377"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-dev/images/sha256-36c4fa047c86e2470ac473ec1429aea6d4b8934b90ffeb34d1afab40e7e5b377>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-runtime/images/sha256-a94150ffb81365234ebfa34e764db5474bc6ab7d141b56495eac349778dafcf3"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
Critical ROCm libraries for TensorFlow
===============================================================================
TensorFlow depends on multiple components and the supported features of those
components can affect the TensorFlow ROCm supported feature set. The versions
in the following table refer to the first TensorFlow version where the ROCm
library was introduced as a dependency. The versions described
are available in ROCm :version:`rocm_version`.
.. list-table::
:widths: 25, 10, 35, 30
:header-rows: 1
* - ROCm library
- Version
- Purpose
- Used in
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`__
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
- Accelerates operations like ``tf.matmul``, ``tf.linalg.matmul``, and
other matrix multiplications commonly used in neural network layers.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`__
- :version-ref:`hipBLASLt rocm_version`
- Extends hipBLAS with additional optimizations like fused kernels and
integer tensor cores.
- Optimizes matrix multiplications and linear algebra operations used in
layers like dense, convolutional, and RNNs in TensorFlow.
* - `hipCUB <https://github.com/ROCm/hipCUB>`__
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
- Supports operations like ``tf.reduce_sum``, ``tf.cumsum``, ``tf.sort``
and other tensor operations in TensorFlow, especially those involving
scanning, sorting, and filtering.
* - `hipFFT <https://github.com/ROCm/hipFFT>`__
- :version-ref:`hipFFT rocm_version`
- Accelerates Fast Fourier Transforms (FFT) for signal processing tasks.
- Used for operations like signal processing, image filtering, and
certain types of neural networks requiring FFT-based transformations.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`__
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated direct linear solvers for dense and sparse
systems.
- Optimizes linear algebra functions such as solving systems of linear
equations, often used in optimization and training tasks.
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`__
- :version-ref:`hipSPARSE rocm_version`
- Optimizes sparse matrix operations for efficient computations on sparse
data.
- Accelerates sparse matrix operations in models with sparse weight
matrices or activations, commonly used in neural networks.
* - `MIOpen <https://github.com/ROCm/MIOpen>`__
- :version-ref:`MIOpen rocm_version`
- Provides optimized deep learning primitives such as convolutions,
pooling,
normalization, and activation functions.
- Speeds up convolutional neural networks (CNNs) and other layers. Used
in TensorFlow for layers like ``tf.nn.conv2d``, ``tf.nn.relu``, and
``tf.nn.lstm_cell``.
* - `RCCL <https://github.com/ROCm/rccl>`__
- :version-ref:`RCCL rocm_version`
- Optimizes for multi-GPU communication for operations like AllReduce and
Broadcast.
- Distributed data parallel training (``tf.distribute.MirroredStrategy``).
Handles communication in multi-GPU setups.
* - `rocThrust <https://github.com/ROCm/rocThrust>`__
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
- Reduction operations like ``tf.reduce_sum``, ``tf.cumsum`` for computing
the cumulative sum of elements along a given axis or ``tf.unique`` to
finds unique elements in a tensor can use rocThrust.
Supported and unsupported features
===============================================================================
The following section maps supported data types and GPU-accelerated TensorFlow
features to their minimum supported ROCm and TensorFlow versions.
Data types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The data type of a tensor is specified using the ``dtype`` attribute or
argument, and TensorFlow supports a wide range of data types for different use
cases.
The basic, single data types of `tf.dtypes <https://www.tensorflow.org/api_docs/python/tf/dtypes>`__
are as follows:
.. list-table::
:header-rows: 1
* - Data type
- Description
- Since TensorFlow
- Since ROCm
* - ``bfloat16``
- 16-bit bfloat (brain floating point).
- 1.0.0
- 1.7
* - ``bool``
- Boolean.
- 1.0.0
- 1.7
* - ``complex128``
- 128-bit complex.
- 1.0.0
- 1.7
* - ``complex64``
- 64-bit complex.
- 1.0.0
- 1.7
* - ``double``
- 64-bit (double precision) floating-point.
- 1.0.0
- 1.7
* - ``float16``
- 16-bit (half precision) floating-point.
- 1.0.0
- 1.7
* - ``float32``
- 32-bit (single precision) floating-point.
- 1.0.0
- 1.7
* - ``float64``
- 64-bit (double precision) floating-point.
- 1.0.0
- 1.7
* - ``half``
- 16-bit (half precision) floating-point.
- 2.0.0
- 2.0
* - ``int16``
- Signed 16-bit integer.
- 1.0.0
- 1.7
* - ``int32``
- Signed 32-bit integer.
- 1.0.0
- 1.7
* - ``int64``
- Signed 64-bit integer.
- 1.0.0
- 1.7
* - ``int8``
- Signed 8-bit integer.
- 1.0.0
- 1.7
* - ``qint16``
- Signed quantized 16-bit integer.
- 1.0.0
- 1.7
* - ``qint32``
- Signed quantized 32-bit integer.
- 1.0.0
- 1.7
* - ``qint8``
- Signed quantized 8-bit integer.
- 1.0.0
- 1.7
* - ``quint16``
- Unsigned quantized 16-bit integer.
- 1.0.0
- 1.7
* - ``quint8``
- Unsigned quantized 8-bit integer.
- 1.0.0
- 1.7
* - ``resource``
- Handle to a mutable, dynamically allocated resource.
- 1.0.0
- 1.7
* - ``string``
- Variable-length string, represented as byte array.
- 1.0.0
- 1.7
* - ``uint16``
- Unsigned 16-bit (word) integer.
- 1.0.0
- 1.7
* - ``uint32``
- Unsigned 32-bit (dword) integer.
- 1.5.0
- 1.7
* - ``uint64``
- Unsigned 64-bit (qword) integer.
- 1.5.0
- 1.7
* - ``uint8``
- Unsigned 8-bit (byte) integer.
- 1.0.0
- 1.7
* - ``variant``
- Data of arbitrary type (known at runtime).
- 1.4.0
- 1.7
Features
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This table provides an overview of key features in TensorFlow and their
availability in ROCm.
.. list-table::
:header-rows: 1
* - Module
- Description
- Since TensorFlow
- Since ROCm
* - ``tf.linalg`` (Linear Algebra)
- Operations for matrix and tensor computations, such as
``tf.linalg.matmul`` (matrix multiplication), ``tf.linalg.inv``
(matrix inversion) and ``tf.linalg.cholesky`` (Cholesky decomposition).
These leverage GPUs for high-performance linear algebra operations.
- 1.4
- 1.8.2
* - ``tf.nn`` (Neural Network Operations)
- GPU-accelerated building blocks for deep learning models, such as 2D
convolutions with ``tf.nn.conv2d``, max pooling operations with
``tf.nn.max_pool``, activation functions like ``tf.nn.relu`` or softmax
for output layers with ``tf.nn.softmax``.
- 1.0
- 1.8.2
* - ``tf.image`` (Image Processing)
- GPU-accelerated functions for image preprocessing and augmentations,
such as resize images with ``tf.image.resize``, flip images horizontally
with ``tf.image.flip_left_right`` and adjust image brightness randomly
with ``tf.image.random_brightness``.
- 1.1
- 1.8.2
* - ``tf.keras`` (High-Level API)
- GPU acceleration for Keras layers and models, including dense layers
(``tf.keras.layers.Dense``), convolutional layers
(``tf.keras.layers.Conv2D``) and recurrent layers
(``tf.keras.layers.LSTM``).
- 1.4
- 1.8.2
* - ``tf.math`` (Mathematical Operations)
- GPU-accelerated mathematical operations, such as sum across dimensions
with ``tf.math.reduce_sum``, elementwise exponentiation with
``tf.math.exp`` and sigmoid activation (``tf.math.sigmoid``).
- 1.5
- 1.8.2
* - ``tf.signal`` (Signal Processing)
- Functions for spectral analysis and signal transformations.
- 1.13
- 2.1
* - ``tf.data`` (Data Input Pipeline)
- GPU-accelerated data preprocessing for efficient input pipelines,
Prefetching with ``tf.data.experimental.AUTOTUNE``. GPU-enabled
transformations like map and batch.
- 1.4
- 1.8.2
* - ``tf.distribute`` (Distributed Training)
- Enabling to scale computations across multiple devices on a single
machine or across multiple machines.
- 1.13
- 2.1
* - ``tf.random`` (Random Number Generation)
- GPU-accelerated random number generation
- 1.12
- 1.9.2
* - ``tf.TensorArray`` (Dynamic Array Operations)
- Enables dynamic tensor manipulation on GPUs.
- 1.0
- 1.8.2
* - ``tf.sparse`` (Sparse Tensor Operations)
- GPU-accelerated sparse matrix manipulations.
- 1.9
- 1.9.0
* - ``tf.experimental.numpy``
- GPU-accelerated NumPy-like API for numerical computations.
- 2.4
- 4.1.1
* - ``tf.RaggedTensor``
- Handling of variable-length sequences and ragged tensors with GPU
support.
- 1.13
- 2.1
* - ``tf.function`` with XLA (Accelerated Linear Algebra)
- Enable GPU-accelerated functions in optimization.
- 1.14
- 2.4
* - ``tf.quantization``
- Quantized operations for inference, accelerated on GPUs.
- 1.12
- 1.9.2
Distributed library features
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Enables developers to scale computations across multiple devices on a single machine or
across multiple machines.
.. list-table::
:header-rows: 1
* - Feature
- Description
- Since TensorFlow
- Since ROCm
* - ``MultiWorkerMirroredStrategy``
- Synchronous training across multiple workers using mirrored variables.
- 2.0
- 3.0
* - ``MirroredStrategy``
- Synchronous training across multiple GPUs on one machine.
- 1.5
- 2.5
* - ``TPUStrategy``
- Efficiently trains models on Google TPUs.
- 1.9
- ❌
* - ``ParameterServerStrategy``
- Asynchronous training using parameter servers for variable management.
- 2.1
- 4.0
* - ``CentralStorageStrategy``
- Keeps variables on a single device and performs computation on multiple
devices.
- 2.3
- 4.1
* - ``CollectiveAllReduceStrategy``
- Synchronous training across multiple devices and hosts.
- 1.14
- 3.5
* - Distribution Strategies API
- High-level API to simplify distributed training configuration and
execution.
- 1.10
- 3.0
Unsupported TensorFlow features
===============================================================================
The following are GPU-accelerated TensorFlow features not currently supported by
ROCm.
.. list-table::
:header-rows: 1
* - Feature
- Description
- Since TensorFlow
* - Mixed Precision with TF32
- Mixed precision with TF32 is used for matrix multiplications,
convolutions, and other linear algebra operations, particularly in
deep learning workloads like CNNs and transformers.
- 2.4
* - ``tf.distribute.TPUStrategy``
- Efficiently trains models on Google TPUs.
- 1.9
Use cases and recommendations
===============================================================================
* The `Training a Neural Collaborative Filtering (NCF) Recommender on an AMD
GPU <https://rocm.blogs.amd.com/artificial-intelligence/ncf/README.html>`__
blog post discusses training an NCF recommender system using TensorFlow. It
explains how NCF improves traditional collaborative filtering methods by
leveraging neural networks to model non-linear user-item interactions. The
post outlines the implementation using the recommenders library, focusing on
the use of implicit data (for example, user interactions like viewing or
purchasing) and how it addresses challenges like the lack of negative values.
* The `Creating a PyTorch/TensorFlow code environment on AMD GPUs
<https://rocm.blogs.amd.com/software-tools-optimization/pytorch-tensorflow-env/README.html>`__
blog post provides instructions for creating a machine learning environment
for PyTorch and TensorFlow on AMD GPUs using ROCm. It covers steps like
installing the libraries, cloning code repositories, installing dependencies,
and troubleshooting potential issues with CUDA-based code. Additionally, it
explains how to HIPify code (port CUDA code to HIP) and manage Docker images
for a better experience on AMD GPUs. This guide aims to help data scientists
and ML practitioners adapt their code for AMD GPUs.
For more use cases and recommendations, see the `ROCm Tensorflow blog posts <https://rocm.blogs.amd.com/blog/tag/tensorflow.html>`__.

View File

@@ -1,85 +0,0 @@
:orphan:
.. meta::
:description: verl compatibility
:keywords: GPU, verl compatibility
.. version-set:: rocm_version latest
*******************************************************************************
verl compatibility
*******************************************************************************
Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs).
verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support.
* See the `verl documentation <https://verl.readthedocs.io/en/latest/>`_ for more information about verl.
* The official verl GitHub repository is `https://github.com/volcengine/verl <https://github.com/volcengine/verl>`_.
* Use the AMD-validated :ref:`Docker images <verl-docker-compat>` with ROCm and verl preinstalled.
* See the :doc:`ROCm verl installation guide <rocm-install-on-linux:install/3rd-party/verl-install>` to get started.
.. note::
verl is supported on ROCm 6.2.0.
.. _verl-recommendations:
Use cases and recommendations
================================================================================
The benefits of verl in large-scale reinforcement leaning from human feedback (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`_ blog.
.. _verl-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes ready-made `ROCm verl Docker images <https://hub.docker.com/r/rocm/verl>`_
with ROCm backends on Docker Hub. The following Docker image tags and associated inventories represent the latest verl version from the official Docker Hub. The Docker images have been validated for `ROCm 6.2.0 <https://repo.radeon.com/rocm/apt/6.2/>`_.
.. list-table::
:header-rows: 1
* - Docker image
- verl
- Linux
- Pytorch
- Python
- vllm
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/verl/verl-0.3.0.post0_rocm6.2_vllm0.6.3/images/sha256-cbe423803fd7850448b22444176bee06f4dcf22cd3c94c27732752d3a39b04b2"><i class="fab fa-docker fa-lg"></i> rocm/verl</a>
- `0.3.0post0 <https://github.com/volcengine/verl/releases/tag/v0.3.0.post0>`_
- Ubuntu 20.04
- `2.5.0 <https://download.pytorch.org/whl/cu118/torch-2.5.0%2Bcu118-cp39-cp39-linux_x86_64.whl#sha256=1ee24b267418c37b297529ede875b961e382c1c365482f4142af2398b92ed127>`_
- `3.9.19 <https://www.python.org/downloads/release/python-3919/>`_
- `0.6.4 <https://github.com/vllm-project/vllm/releases/tag/v0.6.4>`_
Supported features
===============================================================================
The following table shows verl and ROCm support for GPU-accelerated modules.
.. list-table::
:header-rows: 1
* - Module
- Description
- verl version
- ROCm version
* - ``FSDP``
- Training engine
- 0.3.0.post0
- 6.2
* - ``vllm``
- Inference engine
- 0.3.0.post0
- 6.2

File diff suppressed because it is too large Load Diff

View File

@@ -1,407 +0,0 @@
.. meta::
:description: Using CMake
:keywords: CMake, dependencies, HIP, C++, AMD, ROCm
*********************************
Using CMake
*********************************
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of the CMake HIP language support will require CMake 3.21 or higher.
Finding dependencies
====================
.. note::
For a complete
reference on how to deal with dependencies in CMake, refer to the CMake docs
on `find_package
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake related facilities.
In short, CMake supports finding dependencies in two ways:
* In Module mode, it consults a file ``Find<PackageName>.cmake`` which tries to find the component
in typical install locations and layouts. CMake ships a few dozen such scripts, but users and projects
may ship them as well.
* In Config mode, it locates a file named ``<packagename>-config.cmake`` or
``<PackageName>Config.cmake`` which describes the installed component in all regards needed to
consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of file system
paths), or using ``<PackageName>_ROOT`` variable on a project-specific basis.
There are nearly a dozen ways to set these variables. One may be more convenient
over the other depending on your workflow. Conceptually the simplest is adding
it to your CMake configuration command on the command line via
``-D CMAKE_PREFIX_PATH=....`` . AMD packaged ROCm installs can typically be
added to the config file search paths such as:
* Windows: ``-D CMAKE_PREFIX_PATH=${env:HIP_PATH}``
* Linux: ``-D CMAKE_PREFIX_PATH=/opt/rocm``
ROCm provides the respective *config-file* packages, and this enables
``find_package`` to be used directly. ROCm does not require any Find module as
the *config-file* packages are shipped with the upstream projects, such as
rocPRIM and other ROCm libraries.
For a complete guide on where and how ROCm may be installed on a system, refer
to the installation guides for
`Linux <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html>`_
and
`Windows <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/index.html>`_.
Using HIP in CMake
==================
ROCm components providing a C/C++ interface support consumption via any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
library.
Using the HIP single-source programming model
---------------------------------------------
Source code written in the HIP dialect of C++ typically uses the `.hip`
extension. When the HIP CMake language is enabled, it will automatically
associate such source files with the HIP toolchain being used.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.21) # HIP language support requires 3.21
cmake_policy(VERSION 3.21.3...3.27)
project(MyProj LANGUAGES HIP)
add_executable(MyApp Main.hip)
Should you have existing CUDA code that is from the source compatible subset of
HIP, you can tell CMake that despite their `.cu` extension, they're HIP sources.
Do note that this mostly facilitates compiling kernel code-only source files,
as host-side CUDA API won't compile in this fashion.
.. code-block:: cmake
add_library(MyLib MyLib.cu)
set_source_files_properties(MyLib.cu PROPERTIES LANGUAGE HIP)
CMake itself only hosts part of the HIP language support, such as defining
HIP-specific properties, etc. while the other half ships with the HIP
implementation, such as ROCm. CMake will search for a file
`hip-lang-config.cmake` describing how the the properties defined by CMake
translate to toolchain invocations. If one installs ROCm using non-standard
methods or layouts and CMake can't locate this file or detect parts of the SDK,
there's a catch-all, last resort variable consulted locating this file,
``-D CMAKE_HIP_COMPILER_ROCM_ROOT:PATH=`` which should be set the root of the
ROCm installation.
.. note::
Imported targets defined by `hip-lang-config.cmake` are for internal use
only.
If the user doesn't provide a semi-colon delimited list of device architectures
via ``CMAKE_HIP_ARCHITECTURES``, CMake will select some sensible default. It is
advised though that if a user knows what devices they wish to target, then set
this variable explicitly.
Consuming ROCm C/C++ libraries
------------------------------
Libraries such as rocBLAS, rocFFT, MIOpen, etc. behave as C/C++ libraries.
Illustrated in the example below is a C++ application using MIOpen from CMake.
It calls ``find_package(miopen)``, which provides the ``MIOpen`` imported
target. This can be linked with ``target_link_libraries``
.. code-block:: cmake
cmake_minimum_required(VERSION 3.5) # find_package(miopen) requires 3.5
cmake_policy(VERSION 3.5...3.27)
project(MyProj LANGUAGES CXX)
find_package(miopen)
add_library(MyLib ...)
target_link_libraries(MyLib PUBLIC MIOpen)
.. note::
Most libraries are designed as host-only API, so using a GPU device
compiler is not necessary for downstream projects unless they use GPU device
code.
Consuming the HIP API in C++ code
---------------------------------
Consuming the HIP API without compiling single-source GPU device code can be
done using any C++ compiler. The ``find_package(hip)`` provides the
``hip::host`` imported target to use HIP in this scenario.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.5) # find_package(hip) requires 3.5
cmake_policy(VERSION 3.5...3.27)
project(MyProj LANGUAGES CXX)
find_package(hip REQUIRED)
add_executable(MyApp ...)
target_link_libraries(MyApp PRIVATE hip::host)
When mixing such ``CXX`` sources with ``HIP`` sources holding device-code, link
only to `hip::host`. If HIP sources don't have `.hip` as their extension, use
`set_source_files_properties(<hip_sources>... PROPERTIES LANGUAGE HIP)` on them.
Linking to `hip::host` will set all the necessary flags for the ``CXX`` sources
while ``HIP`` sources inherit all flags from the built-in language support.
Having HIP sources in a target will turn the |LINK_LANG|_ into ``HIP``.
.. |LINK_LANG| replace:: ``LINKER_LANGUAGE``
.. _LINK_LANG: https://cmake.org/cmake/help/latest/prop_tgt/LINKER_LANGUAGE.html
Compiling device code in C++ language mode
------------------------------------------
.. attention::
The workflow detailed here is considered legacy and is shown for
understanding's sake. It pre-dates the existence of HIP language support in
CMake. If source code has HIP device code in it, it is a HIP source file
and should be compiled as such. Only resort to the method below if your
HIP-enabled CMake code path can't mandate CMake version 3.21.
If code uses the HIP API and compiles GPU device code, it requires using a
device compiler. The compiler for CMake can be set using either the
``CMAKE_C_COMPILER`` and ``CMAKE_CXX_COMPILER`` variable or using the ``CC``
and ``CXX`` environment variables. This can be set when configuring CMake or
put into a CMake toolchain file. The device compiler must be set to a
compiler that supports AMD GPU targets, which is usually Clang.
The ``find_package(hip)`` provides the ``hip::device`` imported target to add
all the flags necessary for device compilation.
.. code-block:: cmake
cmake_minimum_required(VERSION 3.8) # cxx_std_11 requires 3.8
cmake_policy(VERSION 3.8...3.27)
project(MyProj LANGUAGES CXX)
find_package(hip REQUIRED)
add_library(MyLib ...)
target_link_libraries(MyLib PRIVATE hip::device)
target_compile_features(MyLib PRIVATE cxx_std_11)
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with the following CMake commands:
* Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
* Linux: ``cmake -D CMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++``
Which use the device compiler provided from the binary packages of
`ROCm HIP SDK <https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html>`_ and
`repo.radeon.com <https://repo.radeon.com>`_ respectively.
When using the ``CXX`` language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake packages
-------------------
+-----------+----------+--------------------------------------------------------+
| Component | Package | Targets |
+===========+==========+========================================================+
| HIP | hip | ``hip::host``, ``hip::device`` |
+-----------+----------+--------------------------------------------------------+
| rocPRIM | rocprim | ``roc::rocprim`` |
+-----------+----------+--------------------------------------------------------+
| rocThrust | rocthrust| ``roc::rocthrust`` |
+-----------+----------+--------------------------------------------------------+
| hipCUB | hipcub | ``hip::hipcub`` |
+-----------+----------+--------------------------------------------------------+
| rocRAND | rocrand | ``roc::rocrand`` |
+-----------+----------+--------------------------------------------------------+
| rocBLAS | rocblas | ``roc::rocblas`` |
+-----------+----------+--------------------------------------------------------+
| rocSOLVER | rocsolver| ``roc::rocsolver`` |
+-----------+----------+--------------------------------------------------------+
| hipBLAS | hipblas | ``roc::hipblas`` |
+-----------+----------+--------------------------------------------------------+
| rocFFT | rocfft | ``roc::rocfft`` |
+-----------+----------+--------------------------------------------------------+
| hipFFT | hipfft | ``hip::hipfft`` |
+-----------+----------+--------------------------------------------------------+
| rocSPARSE | rocsparse| ``roc::rocsparse`` |
+-----------+----------+--------------------------------------------------------+
| hipSPARSE | hipsparse| ``roc::hipsparse`` |
+-----------+----------+--------------------------------------------------------+
| rocALUTION|rocalution| ``roc::rocalution`` |
+-----------+----------+--------------------------------------------------------+
| RCCL | rccl | ``rccl`` |
+-----------+----------+--------------------------------------------------------+
| MIOpen | miopen | ``MIOpen`` |
+-----------+----------+--------------------------------------------------------+
| MIGraphX | migraphx | ``migraphx::migraphx``, ``migraphx::migraphx_c``, |
| | | ``migraphx::migraphx_cpu``, ``migraphx::migraphx_gpu``,|
| | | ``migraphx::migraphx_onnx``, ``migraphx::migraphx_tf`` |
+-----------+----------+--------------------------------------------------------+
Using CMake presets
===================
CMake command lines depending on how specific users like to be when compiling
code can grow to unwieldy lengths. This is the primary reason why projects tend
to bake script snippets into their build definitions controlling compiler
warning levels, changing CMake defaults (``CMAKE_BUILD_TYPE`` or
``BUILD_SHARED_LIBS`` just to name a few) and all sorts anti-patterns, all in
the name of convenience.
Load on the command-line interface (CLI) starts immediately by selecting a
toolchain, the set of utilities used to compile programs. To ease some of the
toolchain related pains, CMake does consult the ``CC`` and ``CXX`` environmental
variables when setting a default ``CMAKE_C[XX]_COMPILER`` respectively, but that
is just the tip of the iceberg. There's a fair number of variables related to
just the toolchain itself (typically supplied using
`toolchain files <https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html>`_
), and then we still haven't talked about user preference or project-specific
options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Integration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_.
There are two types of preset files: one supplied by the project, called
``CMakePresets.json`` which is meant to be committed to version control,
typically used to drive CI; and one meant for the user to provide, called
``CMakeUserPresets.json``, typically used to house user preference and adapting
the build to the user's environment. These JSON files are allowed to include
other JSON files and the user presets always implicitly includes the non-user
variant.
Using HIP with presets
----------------------
Following is an example ``CMakeUserPresets.json`` file which actually compiles
the `amd/rocm-examples <https://github.com/amd/rocm-examples>`_ suite of sample
applications on a typical ROCm installation:
.. code-block:: json
{
"version": 3,
"cmakeMinimumRequired": {
"major": 3,
"minor": 21,
"patch": 0
},
"configurePresets": [
{
"name": "layout",
"hidden": true,
"binaryDir": "${sourceDir}/build/${presetName}",
"installDir": "${sourceDir}/install/${presetName}"
},
{
"name": "generator-ninja-multi-config",
"hidden": true,
"generator": "Ninja Multi-Config"
},
{
"name": "toolchain-makefiles-c/c++-amdclang",
"hidden": true,
"cacheVariables": {
"CMAKE_C_COMPILER": "/opt/rocm/bin/amdclang",
"CMAKE_CXX_COMPILER": "/opt/rocm/bin/amdclang++",
"CMAKE_HIP_COMPILER": "/opt/rocm/bin/amdclang++"
}
},
{
"name": "clang-strict-iso-high-warn",
"hidden": true,
"cacheVariables": {
"CMAKE_C_FLAGS": "-Wall -Wextra -pedantic",
"CMAKE_CXX_FLAGS": "-Wall -Wextra -pedantic",
"CMAKE_HIP_FLAGS": "-Wall -Wextra -pedantic"
}
},
{
"name": "ninja-mc-rocm",
"displayName": "Ninja Multi-Config ROCm",
"inherits": [
"layout",
"generator-ninja-multi-config",
"toolchain-makefiles-c/c++-amdclang",
"clang-strict-iso-high-warn"
]
}
],
"buildPresets": [
{
"name": "ninja-mc-rocm-debug",
"displayName": "Debug",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm"
},
{
"name": "ninja-mc-rocm-release",
"displayName": "Release",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm"
},
{
"name": "ninja-mc-rocm-debug-verbose",
"displayName": "Debug (verbose)",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm",
"verbose": true
},
{
"name": "ninja-mc-rocm-release-verbose",
"displayName": "Release (verbose)",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm",
"verbose": true
}
],
"testPresets": [
{
"name": "ninja-mc-rocm-debug",
"displayName": "Debug",
"configuration": "Debug",
"configurePreset": "ninja-mc-rocm",
"execution": {
"jobs": 0
}
},
{
"name": "ninja-mc-rocm-release",
"displayName": "Release",
"configuration": "Release",
"configurePreset": "ninja-mc-rocm",
"execution": {
"jobs": 0
}
}
]
}
.. note::
Getting presets to work reliably on Windows requires some CMake improvements
and/or support from compiler vendors. (Refer to
`Add support to the Visual Studio generators <https://gitlab.kitware.com/cmake/cmake/-/issues/24245>`_
and `Sourcing environment scripts <https://gitlab.kitware.com/cmake/cmake/-/issues/21619>`_
.)

View File

@@ -1,14 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="AMD ROCm documentation">
<meta name="keywords" content="documentation, guides, installation, compatibility, support,
reference, ROCm, AMD">
</head>
# Using compiler features
The following topics describe using specific features of the compilation tools:
* [ROCm compiler infrastructure](https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html)
* [Using AddressSanitizer](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/using-gpu-sanitizer.html)
* [OpenMP support](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/openmp.html)

View File

@@ -1,172 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm Linux Filesystem Hierarchy Standard reorganization">
<meta name="keywords" content="FHS, Linux Filesystem Hierarchy Standard, directory structure,
AMD, ROCm">
</head>
# ROCm Linux Filesystem Hierarchy Standard reorganization
## Introduction
The ROCm Software has adopted the Linux Filesystem Hierarchy Standard (FHS) [https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html) in order to to ensure ROCm is consistent with standard open source conventions. The following sections specify how current and future releases of ROCm adhere to FHS, how the previous ROCm file system is supported, and how improved versioning specifications are applied to ROCm.
## Adopting the FHS
In order to standardize ROCm directory structure and directory content layout ROCm has adopted the [FHS](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html), adhering to open source conventions for Linux-based distribution. FHS ensures internal consistency within the ROCm stack, as well as external consistency with other systems and distributions. The ROCm proposed file structure is outlined below:
```none
/opt/rocm-<ver>
| -- bin
| -- all public binaries
| -- lib
| -- lib<soname>.so->lib<soname>.so.major->lib<soname>.so.major.minor.patch
(public libaries to link with applications)
| -- <component>
| -- architecture dependent libraries and binaries used internally by components
| -- cmake
| -- <component>
| --<component>-config.cmake
| -- libexec
| -- <component>
| -- non ISA/architecture independent executables used internally by components
| -- include
| -- <component>
| -- public header files
| -- share
| -- html
| -- <component>
| -- html documentation
| -- info
| -- <component>
| -- info files
| -- man
| -- <component>
| -- man pages
| -- doc
| -- <component>
| -- license files
| -- <component>
| -- samples
| -- architecture independent misc files
```
## Changes from earlier ROCm versions
The following table provides a brief overview of the new ROCm FHS layout, compared to the layout of earlier ROCm versions. Note that /opt/ is used to denote the default rocm-installation-path and should be replaced in case of a non-standard installation location of the ROCm distribution.
```none
______________________________________________________
| New ROCm Layout | Previous ROCm Layout |
|_____________________________|________________________|
| /opt/rocm-<ver> | /opt/rocm-<ver> |
| | -- bin | | -- bin |
| | -- lib | | -- lib |
| | -- cmake | | -- include |
| | -- libexec | | -- <component_1> |
| | -- include | | -- bin |
| | -- <component_1> | | -- cmake |
| | -- share | | -- doc |
| | -- html | | -- lib |
| | -- info | | -- include |
| | -- man | | -- samples |
| | -- doc | | -- <component_n> |
| | -- <component_1> | | -- bin |
| | -- samples | | -- cmake |
| | -- .. | | -- doc |
| | -- <component_n> | | -- lib |
| | -- samples | | -- include |
| | -- .. | | -- samples |
|______________________________________________________|
```
## ROCm FHS reorganization: backward compatibility
The FHS file organization for ROCm was first introduced in the release of ROCm 5.2 . Backward compatibility was implemented to make sure users could still run their ROCm applications while transitioning to the new FHS. ROCm has moved header files and libraries to their new locations as indicated in the above structure, and included symbolic-links and wrapper header files in their old location for backward compatibility. The following sections detail ROCm backward compatibility implementation for wrapper header files, executable files, library files and CMake config files.
### Wrapper header files
Wrapper header files are placed in the old location (
`/opt/rocm-<ver>/<component>/include`) with a warning message to include files
from the new location (`/opt/rocm-<ver>/include`) as shown in the example below.
```cpp
#pragma message "This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip."
#include <hip/hip_runtime.h>
```
* Starting at ROCm 5.2 release, the deprecation for backward compatibility wrapper header files is: `#pragma` message announcing `#warning`.
* Starting from ROCm 6.0 (tentatively) backward compatibility for wrapper header files will be removed, and the `#pragma` message will be announcing `#error`.
### Executable files
Executable files are available in the `/opt/rocm-<ver>/bin` folder. For backward
compatibility, the old library location (`/opt/rocm-<ver>/<component>/bin`) has a
soft link to the library at the new location. Soft links will be removed in a
future release, tentatively ROCm v6.0.
```bash
$ ls -l /opt/rocm/hip/bin/
lrwxrwxrwx 1 root root 24 Jan 1 23:32 hipcc -> ../../bin/hipcc
```
### Library files
Library files are available in the `/opt/rocm-<ver>/lib` folder. For backward
compatibility, the old library location (`/opt/rocm-<ver>/<component>/lib`) has a
soft link to the library at the new location. Soft links will be removed in a
future release, tentatively ROCm v6.0.
```shell
$ ls -l /opt/rocm/hip/lib/
drwxr-xr-x 4 root root 4096 Jan 1 10:45 cmake
lrwxrwxrwx 1 root root 24 Jan 1 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
```
### CMake config files
All CMake configuration files are available in the
`/opt/rocm-<ver>/lib/cmake/<component>` folder. For backward compatibility, the
old CMake locations (`/opt/rocm-<ver>/<component>/lib/cmake`) consist of a soft
link to the new CMake config. Soft links will be removed in a future release,
tentatively ROCm v6.0.
```shell
$ ls -l /opt/rocm/hip/lib/cmake/hip/
lrwxrwxrwx 1 root root 42 Jan 1 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
```
## Changes required in applications using ROCm
Applications using ROCm are advised to use the new file paths. As the old files
will be deprecated in a future release. Applications have to make sure to include
correct header file and use correct search paths.
1. `#include<header_file.h>` needs to be changed to
`#include <component/header_file.h>`
For example: `#include <hip.h>` needs to change
to `#include <hip/hip.h>`
2. Any variable in CMake or Makefiles pointing to component folder needs to
changed.
For example: `VAR1=/opt/rocm/hip` needs to be changed to `VAR1=/opt/rocm`
`VAR2=/opt/rocm/hsa` needs to be changed to `VAR2=/opt/rocm`
3. Any reference to `/opt/rocm/<component>/bin` or `/opt/rocm/<component>/lib`
needs to be changed to `/opt/rocm/bin` and `/opt/rocm/lib/`, respectively.
## Changes in versioning specifications
In order to better manage ROCm dependencies specification and allow smoother releases of ROCm while avoiding dependency conflicts, ROCm software shall adhere to the following scheme when numbering and incrementing ROCm files versions:
rocm-\<ver\>, where \<ver\> = \<x.y.z\>
x.y.z denote: MAJOR.MINOR.PATCH
z: PATCH - increment z when implementing backward compatible bug fixes.
y: MINOR - increment y when implementing minor changes that add functionality but are still backward compatible.
x: MAJOR - increment x when implementing major changes that are not backward compatible.

View File

@@ -1,72 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="GPU architecture">
<meta name="keywords" content="GPU architecture, architecture support, MI200, MI250, RDNA,
MI100, AMD Instinct">
</head>
(gpu-arch-documentation)=
# GPU architecture documentation
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card}
**AMD Instinct MI300 series**
Review hardware aspects of the AMD Instinct™ MI300 series of GPU accelerators and the CDNA™ 3
architecture.
* [AMD Instinct™ MI300 microarchitecture](./gpu-arch/mi300.md)
* [AMD Instinct MI300/CDNA3 ISA](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf)
* [Performance counters](./gpu-arch/mi300-mi200-performance-counters.rst)
:::
:::{grid-item-card}
**AMD Instinct MI200 series**
Review hardware aspects of the AMD Instinct™ MI200 series of GPU accelerators and the CDNA™ 2
architecture.
* [AMD Instinct™ MI250 microarchitecture](./gpu-arch/mi250.md)
* [AMD Instinct MI200/CDNA2 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
* [White paper](https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf)
* [Performance counters](./gpu-arch/mi300-mi200-performance-counters.rst)
:::
:::{grid-item-card}
**AMD Instinct MI100**
Review hardware aspects of the AMD Instinct™ MI100 series of GPU accelerators and the CDNA™ 1
architecture.
* [AMD Instinct™ MI100 microarchitecture](./gpu-arch/mi100.md)
* [AMD Instinct MI100/CDNA1 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
* [White paper](https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna-white-paper.pdf)
:::
:::{grid-item-card}
**RDNA**
* [AMD RDNA4 ISA](https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna4-instruction-set-architecture.pdf)
* [AMD RDNA3 ISA](https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf)
* [AMD RDNA2 ISA](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
* [AMD RDNA ISA](https://www.amd.com/system/files/TechDocs/rdna-shader-instruction-set-architecture.pdf)
:::
:::{grid-item-card}
**Older architectures**
* [AMD Instinct MI50/Vega 7nm ISA](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
* [AMD Instinct MI25/Vega ISA](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
* [AMD GCN3 ISA](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
* [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
:::
:::::

View File

@@ -1,95 +0,0 @@
---
myst:
html_meta:
"description lang=en": "Learn about the AMD Instinct MI100 series architecture."
"keywords": "Instinct, MI100, microarchitecture, AMD, ROCm"
---
# AMD Instinct™ MI100 microarchitecture
The following image shows the node-level architecture of a system that
comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators.
The two EPYC processors are connected to each other with the AMD Infinity™
fabric which provides a high-bandwidth (up to 18 GT/sec) and coherent links such
that each processor can access the available node memory as a single
shared-memory domain in a non-uniform memory architecture (NUMA) fashion. In a
2P, or dual-socket, configuration, three AMD Infinity™ fabric links are
available to connect the processors plus one PCIe Gen 4 x16 link per processor
can attach additional I/O devices such as the host adapters for the network
fabric.
![Structure of a single GCD in the AMD Instinct MI100 accelerator](../../data/conceptual/gpu-arch/image004.png "Node-level system architecture with two AMD EPYC™ processors and eight AMD Instinct™ accelerators.")
In a typical node configuration, each processor can host up to four AMD
Instinct™ accelerators that are attached using PCIe Gen 4 links at 16 GT/sec,
which corresponds to a peak bidirectional link bandwidth of 32 GB/sec. Each hive
of four accelerators can participate in a fully connected, coherent AMD
Instinct™ fabric that connects the four accelerators using 23 GT/sec AMD
Infinity fabric links that run at a higher frequency than the inter-processor
links. This inter-GPU link can be established in certified server systems if the
GPUs are mounted in neighboring PCIe slots by installing the AMD Infinity
Fabric™ bridge for the AMD Instinct™ accelerators.
## Microarchitecture
The microarchitecture of the AMD Instinct accelerators is based on the AMD CDNA
architecture, which targets compute applications such as high-performance
computing (HPC) and AI & machine learning (ML) that run on everything from
individual servers to the world's largest exascale supercomputers. The overall
system architecture is designed for extreme scalability and compute performance.
![Structure of the AMD Instinct accelerator (MI100 generation)](../../data/conceptual/gpu-arch/image005.png "Structure of the AMD Instinct accelerator (MI100 generation)")
The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16
link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host
processor(s). It also shows the three AMD Infinity Fabric ports that provide
high-speed links (23 GT/sec, also at the bottom) to the other GPUs of the local
hive.
On the left and right of the floor plan, the High Bandwidth Memory (HBM)
attaches via the GPU memory controller. The MI100 generation of the AMD
Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total
of 32GB with a 4,096bit-wide memory interface. The peak memory bandwidth of the
attached HBM2 is 1.228 TB/sec at a memory clock frequency of 1.2 GHz.
The execution units of the GPU are depicted in the above image as Compute
Units (CU). There are a total 120 compute units that are physically organized
into eight Shader Engines (SE) with fifteen compute units per shader engine.
Each compute unit is further sub-divided into four SIMD units that process SIMD
instructions of 16 data elements per instruction. This enables the CU to process
64 data elements (a so-called 'wavefront') at a peak clock frequency of 1.5 GHz.
Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS
(`4 [SIMD units] x 16 [elements per instruction] x 120 [CU] x 1.5 [GHz]`).
![Block diagram of an MI100 compute unit with detailed SIMD view of the AMD CDNA architecture](../../data/conceptual/gpu-arch/image006.png "An MI100 compute unit with detailed SIMD view of the AMD CDNA architecture")
The preceding image shows the block diagram of a single CU of an AMD Instinct™
MI100 accelerator and summarizes how instructions flow through the execution
engines. The CU fetches the instructions via a 32KB instruction cache and moves
them forward to execution via a dispatcher. The CU can handle up to ten
wavefronts at a time and feed their instructions into the execution unit. The
execution unit contains 256 vector general-purpose registers (VGPR) and 800
scalar general-purpose registers (SGPR). The VGPR and SGPR are dynamically
allocated to the executing wavefronts. A wavefront can access a maximum of 102
scalar registers. Excess scalar-register usage will cause register spilling and
thus may affect execution performance.
A wavefront can occupy any number of VGPRs from 0 to 256, directly affecting
occupancy; that is, the number of concurrently active wavefronts in the CU. For
instance, with 119 VGPRs used, only two wavefronts can be active in the CU at
the same time. With the instruction latency of four cycles per SIMD instruction,
the occupancy should be as high as possible such that the compute unit can
improve execution efficiency by scheduling instructions from multiple
wavefronts.
:::{table} Peak-performance capabilities of MI100 for different data types.
:name: mi100-perf
| Computation and Data Type | FLOPS/CLOCK/CU | Peak TFLOPS |
| :------------------------ | :------------: | ----------: |
| Vector FP64 | 64 | 11.5 |
| Matrix FP32 | 256 | 46.1 |
| Vector FP32 | 128 | 23.1 |
| Matrix FP16 | 1024 | 184.6 |
| Matrix BF16 | 512 | 92.3 |
:::

View File

@@ -1,134 +0,0 @@
---
myst:
html_meta:
"description lang=en": "Learn about the AMD Instinct MI250 series architecture."
"keywords": "Instinct, MI250, microarchitecture, AMD, ROCm"
---
# AMD Instinct™ MI250 microarchitecture
The microarchitecture of the AMD Instinct MI250 accelerators is based on the
AMD CDNA 2 architecture that targets compute applications such as HPC,
artificial intelligence (AI), and machine learning (ML) and that run on
everything from individual servers to the worlds largest exascale
supercomputers. The overall system architecture is designed for extreme
scalability and compute performance.
The following image shows the components of a single Graphics Compute Die (GCD) of the CDNA 2 architecture. On the top and the bottom are AMD Infinity Fabric™
interfaces and their physical links that are used to connect the GPU die to the
other system-level components of the node (see also Section 2.2). Both
interfaces can drive four AMD Infinity Fabric links. One of the AMD Infinity
Fabric links of the controller at the bottom can be configured as a PCIe link.
Each of the AMD Infinity Fabric links between GPUs can run at up to 25 GT/sec,
which correlates to a peak transfer bandwidth of 50 GB/sec for a 16-wide link (
two bytes per transaction). Section 2.2 has more details on the number of AMD
Infinity Fabric links and the resulting transfer rates between the system-level
components.
To the left and the right are memory controllers that attach the High Bandwidth
Memory (HBM) modules to the GCD. AMD Instinct MI250 GPUs use HBM2e, which offers
a peak memory bandwidth of 1.6 TB/sec per GCD.
The execution units of the GPU are depicted in the following image as Compute
Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
subdivided into four SIMD units that process SIMD instructions of 16 data
elements per instruction (for the FP64 data type). This enables the CU to
process 64 work items (a so-called “wavefront”) at a peak clock frequency of 1.7
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 22.6
TFLOPS for vector instructions. This equates to 45.3 TFLOPS for vector instructions for both GCDs together. The MI250 compute units also provide specialized
execution units (also called matrix cores), which are geared toward executing
matrix operations like matrix-matrix multiplications. For FP64, the peak
performance of these units amounts to 90.5 TFLOPS.
![Structure of a single GCD in the AMD Instinct MI250 accelerator.](../../data/conceptual/gpu-arch/image001.png "Structure of a single GCD in the AMD Instinct MI250 accelerator.")
```{list-table} Peak-performance capabilities of the MI250 OAM for different data types.
:header-rows: 1
:name: mi250-perf-table
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
*
- Matrix FP64
- 256
- 90.5
*
- Vector FP64
- 128
- 45.3
*
- Matrix FP32
- 256
- 90.5
*
- Packed FP32
- 256
- 90.5
*
- Vector FP32
- 128
- 45.3
*
- Matrix FP16
- 1024
- 362.1
*
- Matrix BF16
- 1024
- 362.1
*
- Matrix INT8
- 1024
- 362.1
```
The above table summarizes the aggregated peak performance of the AMD
Instinct MI250 OCP Open Accelerator Modules (OAM, OCP is short for Open Compute
Platform) and its two GCDs for different data types and execution units. The
middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction
is being retired in each clock cycle. The third column lists the theoretical
peak performance of the OAM module. The theoretical aggregated peak memory
bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).
![Dual-GCD architecture of the AMD Instinct MI250 accelerators](../../data/conceptual/gpu-arch/image002.png "Dual-GCD architecture of the AMD Instinct MI250 accelerators")
The following image shows the block diagram of an OAM package that consists
of two GCDs, each of which constitutes one GPU device in the system. The two
GCDs in the package are connected via four AMD Infinity Fabric links running at
a theoretical peak rate of 25 GT/sec, giving 200 GB/sec peak transfer bandwidth
between the two GCDs of an OAM, or a bidirectional peak transfer bandwidth of
400 GB/sec for the same.
## Node-level architecture
The following image shows the node-level architecture of a system that is
based on the AMD Instinct MI250 accelerator. The MI250 OAMs attach to the host
system via PCIe Gen 4 x16 links (yellow lines). Each GCD maintains its own PCIe
x16 link to the host part of the system. Depending on the server platform, the
GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch
. Note that some platforms may offer an x8 interface to the GCDs, which reduces
the available host-to-GPU bandwidth.
![Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor](../../data/conceptual/gpu-arch/image003.png "Block diagram of AMD Instinct MI250 Accelerators with 3rd Generation AMD EPYC processor")
The preceding image shows the node-level architecture of a system with AMD
EPYC processors in a dual-socket configuration and four AMD Instinct MI250
accelerators. The MI250 OAMs attach to the host processors system via PCIe Gen 4
x16 links (yellow lines). Depending on the system design, a PCIe switch may
exist to make more PCIe lanes available for additional components like network
interfaces and/or storage devices. Each GCD maintains its own PCIe x16 link to
the host part of the system or to the PCIe switch. Please note, some platforms
may offer an x8 interface to the GCDs, which will reduce the available
host-to-GPU bandwidth.
Between the OAMs and their respective GCDs, a peer-to-peer (P2P) network allows
for direct data exchange between the GPU dies via AMD Infinity Fabric links (
black, green, and red lines). Each of these 16-wide links connects to one of the
two GPU dies in the MI250 OAM and operates at 25 GT/sec, which corresponds to a
theoretical peak transfer rate of 50 GB/sec per link (or 100 GB/sec
bidirectional peak transfer bandwidth). The GCD pairs 2 and 6 as well as GCDs 0
and 4 connect via two XGMI links, which is indicated by the thicker red line in
the preceding image.

View File

@@ -1,757 +0,0 @@
.. meta::
:description: MI300 and MI200 series performance counters and metrics
:keywords: MI300, MI200, performance counters, command processor counters
***************************************************************************************************
MI300 and MI200 series performance counters and metrics
***************************************************************************************************
This document lists and describes the hardware performance counters and derived metrics available
for the AMD Instinct™ MI300 and MI200 GPU. You can also access this information using the
:doc:`ROCprofiler-SDK <rocprofiler-sdk:how-to/using-rocprofv3>`.
MI300 and MI200 series performance counters
===============================================================
Series performance counters include the following categories:
* :ref:`command-processor-counters`
* :ref:`graphics-register-bus-manager-counters`
* :ref:`spi-counters`
* :ref:`compute-unit-counters`
* :ref:`l1i-and-sl1d-cache-counters`
* :ref:`vector-l1-cache-subsystem-counters`
* :ref:`l2-cache-access-counters`
The following sections provide additional details for each category.
.. note::
Preliminary validation of all MI300 and MI200 series performance counters is in progress. Those with
an asterisk (*) require further evaluation.
.. _command-processor-counters:
Command processor counters
---------------------------------------------------------------------------------------------------------------
Command processor counters are further classified into command processor-fetcher and command
processor-compute.
Command processor-fetcher counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``CPF_CMP_UTCL1_STALL_ON_TRANSLATION``", "Cycles", "Number of cycles one of the compute unified translation caches (L1) is stalled waiting on translation"
"``CPF_CPF_STAT_BUSY``", "Cycles", "Number of cycles command processor-fetcher is busy"
"``CPF_CPF_STAT_IDLE``", "Cycles", "Number of cycles command processor-fetcher is idle"
"``CPF_CPF_STAT_STALL``", "Cycles", "Number of cycles command processor-fetcher is stalled"
"``CPF_CPF_TCIU_BUSY``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is busy"
"``CPF_CPF_TCIU_IDLE``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is idle"
"``CPF_CPF_TCIU_STALL``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is stalled waiting on free tags"
The texture cache interface unit is the interface between the command processor and the memory
system.
Command processor-compute counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``CPC_ME1_BUSY_FOR_PACKET_DECODE``", "Cycles", "Number of cycles command processor-compute micro engine is busy decoding packets"
"``CPC_UTCL1_STALL_ON_TRANSLATION``", "Cycles", "Number of cycles one of the unified translation caches (L1) is stalled waiting on translation"
"``CPC_CPC_STAT_BUSY``", "Cycles", "Number of cycles command processor-compute is busy"
"``CPC_CPC_STAT_IDLE``", "Cycles", "Number of cycles command processor-compute is idle"
"``CPC_CPC_STAT_STALL``", "Cycles", "Number of cycles command processor-compute is stalled"
"``CPC_CPC_TCIU_BUSY``", "Cycles", "Number of cycles command processor-compute texture cache interface unit interface is busy"
"``CPC_CPC_TCIU_IDLE``", "Cycles", "Number of cycles command processor-compute texture cache interface unit interface is idle"
"``CPC_CPC_UTCL2IU_BUSY``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is busy"
"``CPC_CPC_UTCL2IU_IDLE``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is idle"
"``CPC_CPC_UTCL2IU_STALL``", "Cycles", "Number of cycles command processor-compute unified translation cache (L2) interface is stalled"
"``CPC_ME1_DC0_SPI_BUSY``", "Cycles", "Number of cycles command processor-compute micro engine processor is busy"
The micro engine runs packet-processing firmware on the command processor-compute counter.
.. _graphics-register-bus-manager-counters:
Graphics register bus manager counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``GRBM_COUNT``", "Cycles","Number of free-running GPU cycles"
"``GRBM_GUI_ACTIVE``", "Cycles", "Number of GPU active cycles"
"``GRBM_CP_BUSY``", "Cycles", "Number of cycles any of the command processor blocks are busy"
"``GRBM_SPI_BUSY``", "Cycles", "Number of cycles any of the shader processor input is busy in the shader engines"
"``GRBM_TA_BUSY``", "Cycles", "Number of cycles any of the texture addressing unit is busy in the shader engines"
"``GRBM_TC_BUSY``", "Cycles", "Number of cycles any of the texture cache blocks are busy"
"``GRBM_CPC_BUSY``", "Cycles", "Number of cycles the command processor-compute is busy"
"``GRBM_CPF_BUSY``", "Cycles", "Number of cycles the command processor-fetcher is busy"
"``GRBM_UTCL2_BUSY``", "Cycles", "Number of cycles the unified translation cache (Level 2 [L2]) block is busy"
"``GRBM_EA_BUSY``", "Cycles", "Number of cycles the efficiency arbiter block is busy"
Texture cache blocks include:
* Texture cache arbiter
* Texture cache per pipe, also known as vector Level 1 (L1) cache
* Texture cache per channel, also known as known as L2 cache
* Texture cache interface
.. _spi-counters:
Shader processor input counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SPI_CSN_BUSY``", "Cycles", "Number of cycles with outstanding waves"
"``SPI_CSN_WINDOW_VALID``", "Cycles", "Number of cycles enabled by ``perfcounter_start`` event"
"``SPI_CSN_NUM_THREADGROUPS``", "Workgroups", "Number of dispatched workgroups"
"``SPI_CSN_WAVE``", "Wavefronts", "Number of dispatched wavefronts"
"``SPI_RA_REQ_NO_ALLOC``", "Cycles", "Number of arbiter cycles with requests but no allocation"
"``SPI_RA_REQ_NO_ALLOC_CSN``", "Cycles", "Number of arbiter cycles with compute shader (n\ :sup:`th` pipe) requests but no compute shader (n\ :sup:`th` pipe) allocation"
"``SPI_RA_RES_STALL_CSN``", "Cycles", "Number of arbiter stall cycles due to shortage of compute shader (n\ :sup:`th` pipe) pipeline slots"
"``SPI_RA_TMP_STALL_CSN``", "Cycles", "Number of stall cycles due to shortage of temp space"
"``SPI_RA_WAVE_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of single instruction, multiple data (SIMD) per cycle affected by shortage of wave slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_VGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of vector general-purpose register (VGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_SGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of scalar general-purpose register (SGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_LDS_CU_FULL_CSN``", "CU", "Number of compute units affected by shortage of local data share (LDS) space for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_BAR_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting at a BARRIER"
"``SPI_RA_BULKY_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting for BULKY resource"
"``SPI_RA_TGLIM_CU_FULL_CSN``", "Cycles", "Number of compute shader (n\ :sup:`th` pipe) wave stall cycles due to restriction of ``tg_limit`` for thread group size"
"``SPI_RA_WVLIM_STALL_CSN``", "Cycles", "Number of cycles compute shader (n\ :sup:`th` pipe) is stalled due to ``WAVE_LIMIT``"
"``SPI_VWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize VGPRs when launching waves"
"``SPI_SWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize SGPRs when launching waves"
.. _compute-unit-counters:
Compute unit counters
---------------------------------------------------------------------------------------------------------------
The compute unit counters are further classified into instruction mix, matrix fused multiply-add (FMA)
operation counters, level counters, wavefront counters, wavefront cycle counters, and LDS counters.
Instruction mix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_INSTS``", "Instr", "Number of instructions issued"
"``SQ_INSTS_VALU``", "Instr", "Number of vector arithmetic logic unit (VALU) instructions including matrix FMA issued"
"``SQ_INSTS_VALU_ADD_F16``", "Instr", "Number of VALU half-precision floating-point (F16) ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F16``", "Instr", "Number of VALU F16 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F16``", "Instr", "Number of VALU F16 FMA or multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F16``", "Instr", "Number of VALU F16 Transcendental instructions issued"
"``SQ_INSTS_VALU_ADD_F32``", "Instr", "Number of VALU full-precision floating-point (F32) ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F32``", "Instr", "Number of VALU F32 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F32``", "Instr", "Number of VALU F32 FMAor multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F32``", "Instr", "Number of VALU F32 Transcendental instructions issued"
"``SQ_INSTS_VALU_ADD_F64``", "Instr", "Number of VALU F64 ``ADD`` or ``SUB`` instructions issued"
"``SQ_INSTS_VALU_MUL_F64``", "Instr", "Number of VALU F64 Multiply instructions issued"
"``SQ_INSTS_VALU_FMA_F64``", "Instr", "Number of VALU F64 FMA or multiply-add instructions issued"
"``SQ_INSTS_VALU_TRANS_F64``", "Instr", "Number of VALU F64 Transcendental instructions issued"
"``SQ_INSTS_VALU_INT32``", "Instr", "Number of VALU 32-bit integer instructions (signed or unsigned) issued"
"``SQ_INSTS_VALU_INT64``", "Instr", "Number of VALU 64-bit integer instructions (signed or unsigned) issued"
"``SQ_INSTS_VALU_CVT``", "Instr", "Number of VALU Conversion instructions issued"
"``SQ_INSTS_VALU_MFMA_I8``", "Instr", "Number of 8-bit Integer matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F16``", "Instr", "Number of F16 matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F32``", "Instr", "Number of F32 matrix FMA instructions issued"
"``SQ_INSTS_VALU_MFMA_F64``", "Instr", "Number of F64 matrix FMA instructions issued"
"``SQ_INSTS_MFMA``", "Instr", "Number of matrix FMA instructions issued"
"``SQ_INSTS_VMEM_WR``", "Instr", "Number of vector memory write instructions (including flat) issued"
"``SQ_INSTS_VMEM_RD``", "Instr", "Number of vector memory read instructions (including flat) issued"
"``SQ_INSTS_VMEM``", "Instr", "Number of vector memory instructions issued, including both flat and buffer instructions"
"``SQ_INSTS_SALU``", "Instr", "Number of scalar arithmetic logic unit (SALU) instructions issued"
"``SQ_INSTS_SMEM``", "Instr", "Number of scalar memory instructions issued"
"``SQ_INSTS_SMEM_NORM``", "Instr", "Number of scalar memory instructions normalized to match ``smem_level`` issued"
"``SQ_INSTS_FLAT``", "Instr", "Number of flat instructions issued"
"``SQ_INSTS_FLAT_LDS_ONLY``", "Instr", "**MI200 series only** Number of FLAT instructions that read/write only from/to LDS issued. Works only if ``EARLY_TA_DONE`` is enabled."
"``SQ_INSTS_LDS``", "Instr", "Number of LDS instructions issued **(MI200: includes flat; MI300: does not include flat)**"
"``SQ_INSTS_GDS``", "Instr", "Number of global data share instructions issued"
"``SQ_INSTS_EXP_GDS``", "Instr", "Number of EXP and global data share instructions excluding skipped export instructions issued"
"``SQ_INSTS_BRANCH``", "Instr", "Number of branch instructions issued"
"``SQ_INSTS_SENDMSG``", "Instr", "Number of ``SENDMSG`` instructions including ``s_endpgm`` issued"
"``SQ_INSTS_VSKIPPED``", "Instr", "Number of vector instructions skipped"
Flat instructions allow read, write, and atomic access to a generic memory address pointer that can
resolve to any of the following physical memories:
* Global Memory
* Scratch ("private")
* LDS ("shared")
* Invalid - ``MEM_VIOL`` TrapStatus
Matrix fused multiply-add operation counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_INSTS_VALU_MFMA_MOPS_I8``", "IOP", "Number of 8-bit integer matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F16``", "FLOP", "Number of F16 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_BF16``", "FLOP", "Number of BF16 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F32``", "FLOP", "Number of F32 floating matrix FMA ops in the unit of 512"
"``SQ_INSTS_VALU_MFMA_MOPS_F64``", "FLOP", "Number of F64 floating matrix FMA ops in the unit of 512"
Level counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
All level counters must be followed by ``SQ_ACCUM_PREV_HIRES`` counter to measure average latency.
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_ACCUM_PREV``", "Count", "Accumulated counter sample value where accumulation takes place once every four cycles"
"``SQ_ACCUM_PREV_HIRES``", "Count", "Accumulated counter sample value where accumulation takes place once every cycle"
"``SQ_LEVEL_WAVES``", "Waves", "Number of inflight waves"
"``SQ_INST_LEVEL_VMEM``", "Instr", "Number of inflight vector memory (including flat) instructions"
"``SQ_INST_LEVEL_SMEM``", "Instr", "Number of inflight scalar memory instructions"
"``SQ_INST_LEVEL_LDS``", "Instr", "Number of inflight LDS (including flat) instructions"
"``SQ_IFETCH_LEVEL``", "Instr", "Number of inflight instruction fetch requests from the cache"
Use the following formulae to calculate latencies:
* Vector memory latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_VMEM``
* Wave latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_WAVE``
* LDS latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_LDS``
* Scalar memory latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_INSTS_SMEM_NORM``
* Instruction fetch latency = ``SQ_ACCUM_PREV_HIRES`` divided by ``SQ_IFETCH``
Wavefront counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_WAVES``", "Waves", "Number of wavefronts dispatched to sequencers, including both new and restored wavefronts"
"``SQ_WAVES_SAVED``", "Waves", "Number of context-saved waves"
"``SQ_WAVES_RESTORED``", "Waves", "Number of context-restored waves sent to sequencers"
"``SQ_WAVES_EQ_64``", "Waves", "Number of wavefronts with exactly 64 active threads sent to sequencers"
"``SQ_WAVES_LT_64``", "Waves", "Number of wavefronts with less than 64 active threads sent to sequencers"
"``SQ_WAVES_LT_48``", "Waves", "Number of wavefronts with less than 48 active threads sent to sequencers"
"``SQ_WAVES_LT_32``", "Waves", "Number of wavefronts with less than 32 active threads sent to sequencers"
"``SQ_WAVES_LT_16``", "Waves", "Number of wavefronts with less than 16 active threads sent to sequencers"
Wavefront cycle counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_CYCLES``", "Cycles", "Clock cycles"
"``SQ_BUSY_CYCLES``", "Cycles", "Number of cycles while sequencers reports it to be busy"
"``SQ_BUSY_CU_CYCLES``", "Qcycles", "Number of quad-cycles each compute unit is busy"
"``SQ_VALU_MFMA_BUSY_CYCLES``", "Cycles", "Number of cycles the matrix FMA arithmetic logic unit (ALU) is busy"
"``SQ_WAVE_CYCLES``", "Qcycles", "Number of quad-cycles spent by waves in the compute units"
"``SQ_WAIT_ANY``", "Qcycles", "Number of quad-cycles spent waiting for anything"
"``SQ_WAIT_INST_ANY``", "Qcycles", "Number of quad-cycles spent waiting for any instruction to be issued"
"``SQ_ACTIVE_INST_ANY``", "Qcycles", "Number of quad-cycles spent by each wave to work on an instruction"
"``SQ_ACTIVE_INST_VMEM``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a vector memory instruction"
"``SQ_ACTIVE_INST_LDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an LDS instruction"
"``SQ_ACTIVE_INST_VALU``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a VALU instruction"
"``SQ_ACTIVE_INST_SCA``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a SALU or scalar memory instruction"
"``SQ_ACTIVE_INST_EXP_GDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an ``EXPORT`` or ``GDS`` instruction"
"``SQ_ACTIVE_INST_MISC``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a ``BRANCH`` or ``SENDMSG`` instruction"
"``SQ_ACTIVE_INST_FLAT``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a flat instruction"
"``SQ_INST_CYCLES_VMEM_WR``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory write instructions"
"``SQ_INST_CYCLES_VMEM_RD``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory read instructions"
"``SQ_INST_CYCLES_SMEM``", "Qcycles", "Number of quad-cycles spent to execute scalar memory reads"
"``SQ_INST_CYCLES_SALU``", "Qcycles", "Number of quad-cycles spent to execute non-memory read scalar operations"
"``SQ_THREAD_CYCLES_VALU``", "Qcycles", "Number of quad-cycles spent to execute VALU operations on active threads"
"``SQ_WAIT_INST_LDS``", "Qcycles", "Number of quad-cycles spent waiting for LDS instruction to be issued"
``SQ_THREAD_CYCLES_VALU`` is similar to ``INST_CYCLES_VALU``, but it's multiplied by the number of
active threads.
LDS counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_LDS_ATOMIC_RETURN``", "Cycles", "Number of atomic return cycles in LDS"
"``SQ_LDS_BANK_CONFLICT``", "Cycles", "Number of cycles LDS is stalled by bank conflicts"
"``SQ_LDS_ADDR_CONFLICT``", "Cycles", "Number of cycles LDS is stalled by address conflicts"
"``SQ_LDS_UNALIGNED_STALL``", "Cycles", "Number of cycles LDS is stalled processing flat unaligned load or store operations"
"``SQ_LDS_MEM_VIOLATIONS``", "Count", "Number of threads that have a memory violation in the LDS"
"``SQ_LDS_IDX_ACTIVE``", "Cycles", "Number of cycles LDS is used for indexed operations"
Miscellaneous counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQ_IFETCH``", "Count", "Number of instruction fetch requests from L1i, in 32-byte width"
"``SQ_ITEMS``", "Threads", "Number of valid items per wave"
.. _l1i-and-sl1d-cache-counters:
L1 instruction cache (L1i) and scalar L1 data cache (L1d) counters
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Unit", "Definition"
"``SQC_ICACHE_REQ``", "Req", "Number of L1 instruction (L1i) cache requests"
"``SQC_ICACHE_HITS``", "Count", "Number of L1i cache hits"
"``SQC_ICACHE_MISSES``", "Count", "Number of non-duplicate L1i cache misses including uncached requests"
"``SQC_ICACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate L1i cache misses whose previous lookup miss on the same cache line is not fulfilled yet"
"``SQC_DCACHE_REQ``", "Req", "Number of scalar L1d requests"
"``SQC_DCACHE_INPUT_VALID_READYB``", "Cycles", "Number of cycles while sequencer input is valid but scalar L1d is not ready"
"``SQC_DCACHE_HITS``", "Count", "Number of scalar L1d hits"
"``SQC_DCACHE_MISSES``", "Count", "Number of non-duplicate scalar L1d misses including uncached requests"
"``SQC_DCACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate scalar L1d misses"
"``SQC_DCACHE_REQ_READ_1``", "Req", "Number of constant cache read requests in a single 32-bit data word"
"``SQC_DCACHE_REQ_READ_2``", "Req", "Number of constant cache read requests in two 32-bit data words"
"``SQC_DCACHE_REQ_READ_4``", "Req", "Number of constant cache read requests in four 32-bit data words"
"``SQC_DCACHE_REQ_READ_8``", "Req", "Number of constant cache read requests in eight 32-bit data words"
"``SQC_DCACHE_REQ_READ_16``", "Req", "Number of constant cache read requests in 16 32-bit data words"
"``SQC_DCACHE_ATOMIC``", "Req", "Number of atomic requests"
"``SQC_TC_REQ``", "Req", "Number of texture cache requests that were issued by instruction and constant caches"
"``SQC_TC_INST_REQ``", "Req", "Number of instruction requests to the L2 cache"
"``SQC_TC_DATA_READ_REQ``", "Req", "Number of data Read requests to the L2 cache"
"``SQC_TC_DATA_WRITE_REQ``", "Req", "Number of data write requests to the L2 cache"
"``SQC_TC_DATA_ATOMIC_REQ``", "Req", "Number of data atomic requests to the L2 cache"
"``SQC_TC_STALL``", "Cycles", "Number of cycles while the valid requests to the L2 cache are stalled"
.. _vector-l1-cache-subsystem-counters:
Vector L1 cache subsystem counters
---------------------------------------------------------------------------------------------------------------
The vector L1 cache subsystem counters are further classified into texture addressing unit, texture data
unit, vector L1d or texture cache per pipe, and texture cache arbiter counters.
Texture addressing unit counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TA_TA_BUSY[n]``", "Cycles", "Texture addressing unit busy cycles", "0-15"
"``TA_TOTAL_WAVEFRONTS[n]``", "Instr", "Number of wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_WAVEFRONTS[n]``", "Instr", "Number of buffer wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_READ_WAVEFRONTS[n]``", "Instr", "Number of buffer read wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_WRITE_WAVEFRONTS[n]``", "Instr", "Number of buffer write wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_ATOMIC_WAVEFRONTS[n]``", "Instr", "Number of buffer atomic wavefronts processed by texture addressing unit", "0-15"
"``TA_BUFFER_TOTAL_CYCLES[n]``", "Cycles", "Number of buffer cycles (including read and write) issued to texture cache", "0-15"
"``TA_BUFFER_COALESCED_READ_CYCLES[n]``", "Cycles", "Number of coalesced buffer read cycles issued to texture cache", "0-15"
"``TA_BUFFER_COALESCED_WRITE_CYCLES[n]``", "Cycles", "Number of coalesced buffer write cycles issued to texture cache", "0-15"
"``TA_ADDR_STALLED_BY_TC_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit address path is stalled by texture cache", "0-15"
"``TA_DATA_STALLED_BY_TC_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit data path is stalled by texture cache", "0-15"
"``TA_ADDR_STALLED_BY_TD_CYCLES[n]``", "Cycles", "Number of cycles texture addressing unit address path is stalled by texture data unit", "0-15"
"``TA_FLAT_WAVEFRONTS[n]``", "Instr", "Number of flat opcode wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_READ_WAVEFRONTS[n]``", "Instr", "Number of flat opcode read wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_WRITE_WAVEFRONTS[n]``", "Instr", "Number of flat opcode write wavefronts processed by texture addressing unit", "0-15"
"``TA_FLAT_ATOMIC_WAVEFRONTS[n]``", "Instr", "Number of flat opcode atomic wavefronts processed by texture addressing unit", "0-15"
Texture data unit counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TD_TD_BUSY[n]``", "Cycle", "Texture data unit busy cycles while it is processing or waiting for data", "0-15"
"``TD_TC_STALL[n]``", "Cycle", "Number of cycles texture data unit is stalled waiting for texture cache data", "0-15"
"``TD_SPI_STALL[n]``", "Cycle", "Number of cycles texture data unit is stalled by shader processor input", "0-15"
"``TD_LOAD_WAVEFRONT[n]``", "Instr", "Number of wavefront instructions (read, write, atomic)", "0-15"
"``TD_STORE_WAVEFRONT[n]``", "Instr", "Number of write wavefront instructions", "0-15"
"``TD_ATOMIC_WAVEFRONT[n]``", "Instr", "Number of atomic wavefront instructions", "0-15"
"``TD_COALESCABLE_WAVEFRONT[n]``", "Instr", "Number of coalescable wavefronts according to texture addressing unit", "0-15"
Texture cache per pipe counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCP_GATE_EN1[n]``", "Cycles", "Number of cycles vector L1d interface clocks are turned on", "0-15"
"``TCP_GATE_EN2[n]``", "Cycles", "Number of cycles vector L1d core clocks are turned on", "0-15"
"``TCP_TD_TCP_STALL_CYCLES[n]``", "Cycles", "Number of cycles texture data unit stalls vector L1d", "0-15"
"``TCP_TCR_TCP_STALL_CYCLES[n]``", "Cycles", "Number of cycles texture cache router stalls vector L1d", "0-15"
"``TCP_READ_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on a read", "0-15"
"``TCP_WRITE_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on a write", "0-15"
"``TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES[n]``", "Cycles", "Number of cycles tag RAM conflict stalls on an atomic", "0-15"
"``TCP_PENDING_STALL_CYCLES[n]``", "Cycles", "Number of cycles vector L1d is stalled due to data pending from L2 Cache", "0-15"
"``TCP_TCP_TA_DATA_STALL_CYCLES``", "Cycles", "Number of cycles texture cache per pipe stalls texture addressing unit data interface", "NA"
"``TCP_TA_TCP_STATE_READ[n]``", "Req", "Number of state reads", "0-15"
"``TCP_VOLATILE[n]``", "Req", "Number of L1 volatile pixels or buffers from texture addressing unit", "0-15"
"``TCP_TOTAL_ACCESSES[n]``", "Req", "Number of vector L1d accesses. Equals ``TCP_PERF_SEL_TOTAL_READ`+`TCP_PERF_SEL_TOTAL_NONREAD``", "0-15"
"``TCP_TOTAL_READ[n]``", "Req", "Number of vector L1d read accesses", "0-15"
"``TCP_TOTAL_WRITE[n]``", "Req", "Number of vector L1d write accesses", "0-15"
"``TCP_TOTAL_ATOMIC_WITH_RET[n]``", "Req", "Number of vector L1d atomic requests with return", "0-15"
"``TCP_TOTAL_ATOMIC_WITHOUT_RET[n]``", "Req", "Number of vector L1d atomic without return", "0-15"
"``TCP_TOTAL_WRITEBACK_INVALIDATES[n]``", "Count", "Total number of vector L1d writebacks and invalidates", "0-15"
"``TCP_UTCL1_REQUEST[n]``", "Req", "Number of address translation requests to unified translation cache (L1)", "0-15"
"``TCP_UTCL1_TRANSLATION_HIT[n]``", "Req", "Number of unified translation cache (L1) translation hits", "0-15"
"``TCP_UTCL1_TRANSLATION_MISS[n]``", "Req", "Number of unified translation cache (L1) translation misses", "0-15"
"``TCP_UTCL1_PERMISSION_MISS[n]``", "Req", "Number of unified translation cache (L1) permission misses", "0-15"
"``TCP_TOTAL_CACHE_ACCESSES[n]``", "Req", "Number of vector L1d cache accesses including hits and misses", "0-15"
"``TCP_TCP_LATENCY[n]``", "Cycles", "**MI200 series only** Accumulated wave access latency to vL1D over all wavefronts", "0-15"
"``TCP_TCC_READ_REQ_LATENCY[n]``", "Cycles", "**MI200 series only** Total vL1D to L2 request latency over all wavefronts for reads and atomics with return", "0-15"
"``TCP_TCC_WRITE_REQ_LATENCY[n]``", "Cycles", "**MI200 series only** Total vL1D to L2 request latency over all wavefronts for writes and atomics without return", "0-15"
"``TCP_TCC_READ_REQ[n]``", "Req", "Number of read requests to L2 cache", "0-15"
"``TCP_TCC_WRITE_REQ[n]``", "Req", "Number of write requests to L2 cache", "0-15"
"``TCP_TCC_ATOMIC_WITH_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache with return", "0-15"
"``TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache without return", "0-15"
"``TCP_TCC_NC_READ_REQ[n]``", "Req", "Number of non-coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_UC_READ_REQ[n]``", "Req", "Number of uncached read requests to L2 cache", "0-15"
"``TCP_TCC_CC_READ_REQ[n]``", "Req", "Number of coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_RW_READ_REQ[n]``", "Req", "Number of coherently cached with write read requests to L2 cache", "0-15"
"``TCP_TCC_NC_WRITE_REQ[n]``", "Req", "Number of non-coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_UC_WRITE_REQ[n]``", "Req", "Number of uncached write requests to L2 cache", "0-15"
"``TCP_TCC_CC_WRITE_REQ[n]``", "Req", "Number of coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_RW_WRITE_REQ[n]``", "Req", "Number of coherently cached with write write requests to L2 cache", "0-15"
"``TCP_TCC_NC_ATOMIC_REQ[n]``", "Req", "Number of non-coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_UC_ATOMIC_REQ[n]``", "Req", "Number of uncached atomic requests to L2 cache", "0-15"
"``TCP_TCC_CC_ATOMIC_REQ[n]``", "Req", "Number of coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_RW_ATOMIC_REQ[n]``", "Req", "Number of coherently cached with write atomic requests to L2 cache", "0-15"
Note that:
* ``TCP_TOTAL_READ[n]`` = ``TCP_PERF_SEL_TOTAL_HIT_LRU_READ`` + ``TCP_PERF_SEL_TOTAL_MISS_LRU_READ`` + ``TCP_PERF_SEL_TOTAL_MISS_EVICT_READ``
* ``TCP_TOTAL_WRITE[n]`` = ``TCP_PERF_SEL_TOTAL_MISS_LRU_WRITE``+ ``TCP_PERF_SEL_TOTAL_MISS_EVICT_WRITE``
* ``TCP_TOTAL_WRITEBACK_INVALIDATES[n]`` = ``TCP_PERF_SEL_TOTAL_WBINVL1``+ ``TCP_PERF_SEL_TOTAL_WBINVL1_VOL``+ ``TCP_PERF_SEL_CP_TCP_INVALIDATE``+ ``TCP_PERF_SEL_SQ_TCP_INVALIDATE_VOL``
Texture cache arbiter counters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCA_CYCLE[n]``", "Cycles", "Number of texture cache arbiter cycles", "0-31"
"``TCA_BUSY[n]``", "Cycles", "Number of cycles texture cache arbiter has a pending request", "0-31"
.. _l2-cache-access-counters:
L2 cache access counters
---------------------------------------------------------------------------------------------------------------
L2 cache is also known as texture cache per channel.
.. tab-set::
.. tab-item:: MI300 hardware counter
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA0_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA0_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_WR_UNCACHED_32B[n]``", "Req", "Number of 32 or 64-byte write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic", "0-31"
"``TCC_EA0_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA0_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA0_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA0_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA0_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA0_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA0_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA0_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA0_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
.. tab-item:: MI200 hardware counter
.. csv-table::
:header: "Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_WR_UNCACHED_32B[n]``", "Req", "Number of 32 write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic. A 64-byte request will be counted as 2", "0-31"
"``TCC_EA_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
Note the following:
* ``TCC_REQ[n]`` may be more than the number of requests arriving at the texture cache per channel,
but it's a good indication of the total amount of work that needs to be performed.
* For ``TCC_EA0_WRREQ[n]``, atomics may travel over the same interface and are generally classified as
write requests.
* CC mtypes can produce uncached requests, and those are included in
``TCC_EA0_WR_UNCACHED_32B[n]``
* ``TCC_EA0_WRREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter write latency.
* Average write latency = ``TCC_PERF_SEL_EA0_WRREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ``
* ``TCC_EA0_ATOMIC_LEVEL[n]`` is primarily intended to measure average efficiency arbiter atomic
latency
* Average atomic latency = ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC``
* ``TCC_EA0_RDREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter read latency.
* Average read latency = ``TCC_PERF_SEL_EA0_RDREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_RDREQ``
* Stalls can occur regardless of the need for a read to be performed
* Normally, stalls are measured exactly at one point in the pipeline however in the case of
``TCC_TAG_STALL[n]``, probes can stall the pipeline at a variety of places. There is no single point that
can accurately measure the total stalls
MI300 and MI200 series derived metrics list
==============================================================
.. csv-table::
:header: "Hardware counter", "Definition"
"``ALUStalledByLDS``", "Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready (value range: 0% (optimal) to 100%)"
"``FetchSize``", "Total kilobytes fetched from the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``FlatLDSInsts``", "Average number of flat instructions that read from or write to LDS, run per work item (affected by flow control)"
"``FlatVMemInsts``", "Average number of flat instructions that read from or write to the video memory, run per work item (affected by flow control). Includes flat instructions that read from or write to scratch"
"``GDSInsts``", "Average number of global data share read or write instructions run per work item (affected by flow control)"
"``GPUBusy``", "Percentage of time GPU is busy"
"``L2CacheHit``", "Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache (value range: 0% (no hit) to 100% (optimal))"
"``LDSBankConflict``", "Percentage of GPU time LDS is stalled by bank conflicts (value range: 0% (optimal) to 100%)"
"``LDSInsts``", "Average number of LDS read or write instructions run per work item (affected by flow control). Excludes flat instructions that read from or write to LDS."
"``MemUnitBusy``", "Percentage of GPU time the memory unit is active, which is measured with all extra fetches and writes and any cache or memory effects taken into account (value range: 0% to 100% (fetch-bound))"
"``MemUnitStalled``", "Percentage of GPU time the memory unit is stalled (value range: 0% (optimal) to 100%)"
"``MemWrites32B``", "Total number of effective 32B write transactions to the memory"
"``TCA_BUSY_sum``", "Total number of cycles texture cache arbiter has a pending request, over all texture cache arbiter instances"
"``TCA_CYCLE_sum``", "Total number of cycles over all texture cache arbiter instances"
"``SALUBusy``", "Percentage of GPU time scalar ALU instructions are processed (value range: 0% to 100% (optimal))"
"``SALUInsts``", "Average number of scalar ALU instructions run per work item (affected by flow control)"
"``SFetchInsts``", "Average number of scalar fetch instructions from the video memory run per work item (affected by flow control)"
"``VALUBusy``", "Percentage of GPU time vector ALU instructions are processed (value range: 0% to 100% (optimal))"
"``VALUInsts``", "Average number of vector ALU instructions run per work item (affected by flow control)"
"``VALUUtilization``", "Percentage of active vector ALU threads in a wave, where a lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64 (value range: 0%, 100% (optimal - no thread divergence))"
"``VFetchInsts``", "Average number of vector fetch instructions from the video memory run per work-item (affected by flow control); excludes flat instructions that fetch from video memory"
"``VWriteInsts``", "Average number of vector write instructions to the video memory run per work-item (affected by flow control); excludes flat instructions that write to video memory"
"``Wavefronts``", "Total wavefronts"
"``WRITE_REQ_32B``", "Total number of 32-byte effective memory writes"
"``WriteSize``", "Total kilobytes written to the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``WriteUnitStalled``", "Percentage of GPU time the write unit is stalled (value range: 0% (optimal) to 100%)"
You can lower ``ALUStalledByLDS`` by reducing LDS bank conflicts or number of LDS accesses.
You can lower ``MemUnitStalled`` by reducing the number or size of fetches and writes.
``MemUnitBusy`` includes the stall time (``MemUnitStalled``).
Hardware counters by and over all texture addressing unit instances
---------------------------------------------------------------------------------------------------------------
The following table shows the hardware counters *by* all texture addressing unit instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TA_BUFFER_WAVEFRONTS_sum``", "Total number of buffer wavefronts processed"
"``TA_BUFFER_READ_WAVEFRONTS_sum``", "Total number of buffer read wavefronts processed"
"``TA_BUFFER_WRITE_WAVEFRONTS_sum``", "Total number of buffer write wavefronts processed"
"``TA_BUFFER_ATOMIC_WAVEFRONTS_sum``", "Total number of buffer atomic wavefronts processed"
"``TA_BUFFER_TOTAL_CYCLES_sum``", "Total number of buffer cycles (including read and write) issued to texture cache"
"``TA_BUFFER_COALESCED_READ_CYCLES_sum``", "Total number of coalesced buffer read cycles issued to texture cache"
"``TA_BUFFER_COALESCED_WRITE_CYCLES_sum``", "Total number of coalesced buffer write cycles issued to texture cache"
"``TA_FLAT_READ_WAVEFRONTS_sum``", "Sum of flat opcode reads processed"
"``TA_FLAT_WRITE_WAVEFRONTS_sum``", "Sum of flat opcode writes processed"
"``TA_FLAT_WAVEFRONTS_sum``", "Total number of flat opcode wavefronts processed"
"``TA_FLAT_ATOMIC_WAVEFRONTS_sum``", "Total number of flat opcode atomic wavefronts processed"
"``TA_TOTAL_WAVEFRONTS_sum``", "Total number of wavefronts processed"
The following table shows the hardware counters *over* all texture addressing unit instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TA_ADDR_STALLED_BY_TC_CYCLES_sum``", "Total number of cycles texture addressing unit address path is stalled by texture cache"
"``TA_ADDR_STALLED_BY_TD_CYCLES_sum``", "Total number of cycles texture addressing unit address path is stalled by texture data unit"
"``TA_BUSY_avr``", "Average number of busy cycles"
"``TA_BUSY_max``", "Maximum number of texture addressing unit busy cycles"
"``TA_BUSY_min``", "Minimum number of texture addressing unit busy cycles"
"``TA_DATA_STALLED_BY_TC_CYCLES_sum``", "Total number of cycles texture addressing unit data path is stalled by texture cache"
"``TA_TA_BUSY_sum``", "Total number of texture addressing unit busy cycles"
Hardware counters over all texture cache per channel instances
---------------------------------------------------------------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCC_ALL_TC_OP_WB_WRITEBACK_sum``", "Total number of writebacks due to all ``TC_OP`` writeback requests."
"``TCC_ALL_TC_OP_INV_EVICT_sum``", "Total number of evictions due to all ``TC_OP`` invalidate requests."
"``TCC_ATOMIC_sum``", "Total number of L2 cache atomic requests of all types."
"``TCC_BUSY_avr``", "Average number of L2 cache busy cycles."
"``TCC_BUSY_sum``", "Total number of L2 cache busy cycles."
"``TCC_CC_REQ_sum``", "Total number of coherently cached requests."
"``TCC_CYCLE_sum``", "Total number of L2 cache free running clocks."
"``TCC_EA0_WRREQ_sum``", "Total number of 32-byte and 64-byte transactions going over the ``TC_EA0_wrreq`` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands."
"``TCC_EA0_WRREQ_64B_sum``", "Total number of 64-byte transactions (write or `CMPSWAP`) going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_WR_UNCACHED_32B_sum``", "Total Number of 32-byte write or atomic going over the ``TC_EA0_wrreq`` interface due to uncached traffic. Note that coherently cached mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2."
"``TCC_EA0_WRREQ_STALL_sum``", "Total Number of cycles a write request is stalled, over all instances."
"``TCC_EA0_WRREQ_IO_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of IO credits, over all instances."
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits, over all instances."
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits, over all instances."
"``TCC_EA0_WRREQ_LEVEL_sum``", "Total number of efficiency arbiter write requests in flight."
"``TCC_EA0_RDREQ_LEVEL_sum``", "Total number of efficiency arbiter read requests in flight."
"``TCC_EA0_ATOMIC_sum``", "Total Number of 32-byte or 64-byte atomic requests going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_ATOMIC_LEVEL_sum``", "Total number of efficiency arbiter atomic requests in flight."
"``TCC_EA0_RDREQ_sum``", "Total number of 32-byte or 64-byte read requests to efficiency arbiter."
"``TCC_EA0_RDREQ_32B_sum``", "Total number of 32-byte read requests to efficiency arbiter."
"``TCC_EA0_RD_UNCACHED_32B_sum``", "Total number of 32-byte efficiency arbiter reads due to uncached traffic."
"``TCC_EA0_RDREQ_IO_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of IO credits."
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of GMI credits."
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of DRAM credits."
"``TCC_EA0_RDREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter read requests to HBM."
"``TCC_EA0_WRREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter write requests to HBM."
"``TCC_HIT_sum``", "Total number of L2 cache hits."
"``TCC_MISS_sum``", "Total number of L2 cache misses."
"``TCC_NC_REQ_sum``", "Total number of non-coherently cached requests."
"``TCC_NORMAL_WRITEBACK_sum``", "Total number of writebacks due to requests that are not writeback requests."
"``TCC_NORMAL_EVICT_sum``", "Total number of evictions due to requests that are not invalidate or probe requests."
"``TCC_PROBE_sum``", "Total number of probe requests."
"``TCC_PROBE_ALL_sum``", "Total number of external probe requests with ``EA0_TCC_preq_all == 1``."
"``TCC_READ_sum``", "Total number of L2 cache read requests (including compressed reads but not metadata reads)."
"``TCC_REQ_sum``", "Total number of all types of L2 cache requests."
"``TCC_RW_REQ_sum``", "Total number of coherently cached with write requests."
"``TCC_STREAMING_REQ_sum``", "Total number of L2 cache streaming requests."
"``TCC_TAG_STALL_sum``", "Total number of cycles the normal request pipeline in the tag is stalled for any reason."
"``TCC_TOO_MANY_EA0_WRREQS_STALL_sum``", "Total number of cycles L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests."
"``TCC_UC_REQ_sum``", "Total number of uncached requests."
"``TCC_WRITE_sum``", "Total number of L2 cache write requests."
"``TCC_WRITEBACK_sum``", "Total number of lines written back to the main memory including writebacks of dirty lines and uncached write or atomic requests."
"``TCC_WRREQ_STALL_max``", "Maximum number of cycles a write request is stalled."
Hardware counters by, for, or over all texture cache per pipe instances
----------------------------------------------------------------------------------------------------------------
The following table shows the hardware counters *by* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_TA_TCP_STATE_READ_sum``", "Total number of state reads by ATCPPI"
"``TCP_TOTAL_CACHE_ACCESSES_sum``", "Total number of vector L1d accesses (including hits and misses)"
"``TCP_UTCL1_PERMISSION_MISS_sum``", "Total number of unified translation cache (L1) permission misses"
"``TCP_UTCL1_REQUEST_sum``", "Total number of address translation requests to unified translation cache (L1)"
"``TCP_UTCL1_TRANSLATION_MISS_sum``", "Total number of unified translation cache (L1) translation misses"
"``TCP_UTCL1_TRANSLATION_HIT_sum``", "Total number of unified translation cache (L1) translation hits"
The following table shows the hardware counters *for* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_TCC_READ_REQ_LATENCY_sum``", "Total vector L1d to L2 request latency over all wavefronts for reads and atomics with return"
"``TCP_TCC_WRITE_REQ_LATENCY_sum``", "Total vector L1d to L2 request latency over all wavefronts for writes and atomics without return"
"``TCP_TCP_LATENCY_sum``", "Total wave access latency to vector L1d over all wavefronts"
The following table shows the hardware counters *over* all texture cache per pipe instances.
.. csv-table::
:header: "Hardware counter", "Definition"
"``TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on an atomic"
"``TCP_GATE_EN1_sum``", "Total number of cycles vector L1d interface clocks are turned on"
"``TCP_GATE_EN2_sum``", "Total number of cycles vector L1d core clocks are turned on"
"``TCP_PENDING_STALL_CYCLES_sum``", "Total number of cycles vector L1d cache is stalled due to data pending from L2 Cache"
"``TCP_READ_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on a read"
"``TCP_TCC_ATOMIC_WITH_RET_REQ_sum``", "Total number of atomic requests to L2 cache with return"
"``TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum``", "Total number of atomic requests to L2 cache without return"
"``TCP_TCC_CC_READ_REQ_sum``", "Total number of coherently cached read requests to L2 cache"
"``TCP_TCC_CC_WRITE_REQ_sum``", "Total number of coherently cached write requests to L2 cache"
"``TCP_TCC_CC_ATOMIC_REQ_sum``", "Total number of coherently cached atomic requests to L2 cache"
"``TCP_TCC_NC_READ_REQ_sum``", "Total number of non-coherently cached read requests to L2 cache"
"``TCP_TCC_NC_WRITE_REQ_sum``", "Total number of non-coherently cached write requests to L2 cache"
"``TCP_TCC_NC_ATOMIC_REQ_sum``", "Total number of non-coherently cached atomic requests to L2 cache"
"``TCP_TCC_READ_REQ_sum``", "Total number of read requests to L2 cache"
"``TCP_TCC_RW_READ_REQ_sum``", "Total number of coherently cached with write read requests to L2 cache"
"``TCP_TCC_RW_WRITE_REQ_sum``", "Total number of coherently cached with write write requests to L2 cache"
"``TCP_TCC_RW_ATOMIC_REQ_sum``", "Total number of coherently cached with write atomic requests to L2 cache"
"``TCP_TCC_UC_READ_REQ_sum``", "Total number of uncached read requests to L2 cache"
"``TCP_TCC_UC_WRITE_REQ_sum``", "Total number of uncached write requests to L2 cache"
"``TCP_TCC_UC_ATOMIC_REQ_sum``", "Total number of uncached atomic requests to L2 cache"
"``TCP_TCC_WRITE_REQ_sum``", "Total number of write requests to L2 cache"
"``TCP_TCR_TCP_STALL_CYCLES_sum``", "Total number of cycles texture cache router stalls vector L1d"
"``TCP_TD_TCP_STALL_CYCLES_sum``", "Total number of cycles texture data unit stalls vector L1d"
"``TCP_TOTAL_ACCESSES_sum``", "Total number of vector L1d accesses"
"``TCP_TOTAL_READ_sum``", "Total number of vector L1d read accesses"
"``TCP_TOTAL_WRITE_sum``", "Total number of vector L1d write accesses"
"``TCP_TOTAL_ATOMIC_WITH_RET_sum``", "Total number of vector L1d atomic requests with return"
"``TCP_TOTAL_ATOMIC_WITHOUT_RET_sum``", "Total number of vector L1d atomic requests without return"
"``TCP_TOTAL_WRITEBACK_INVALIDATES_sum``", "Total number of vector L1d writebacks and invalidates"
"``TCP_VOLATILE_sum``", "Total number of L1 volatile pixels or buffers from texture addressing unit"
"``TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum``", "Total number of cycles tag RAM conflict stalls on a write"
Hardware counter over all texture data unit instances
--------------------------------------------------------
.. csv-table::
:header: "Hardware counter", "Definition"
"``TD_ATOMIC_WAVEFRONT_sum``", "Total number of atomic wavefront instructions"
"``TD_COALESCABLE_WAVEFRONT_sum``", "Total number of coalescable wavefronts according to texture addressing unit"
"``TD_LOAD_WAVEFRONT_sum``", "Total number of wavefront instructions (read, write, atomic)"
"``TD_SPI_STALL_sum``", "Total number of cycles texture data unit is stalled by shader processor input"
"``TD_STORE_WAVEFRONT_sum``", "Total number of write wavefront instructions"
"``TD_TC_STALL_sum``", "Total number of cycles texture data unit is stalled waiting for texture cache data"
"``TD_TD_BUSY_sum``", "Total number of texture data unit busy cycles while it is processing or waiting for data"

View File

@@ -1,129 +0,0 @@
---
myst:
html_meta:
"description lang=en": "Learn about the AMD Instinct MI300 series architecture."
"keywords": "Instinct, MI300X, MI300A, microarchitecture, AMD, ROCm"
---
# AMD Instinct™ MI300 series microarchitecture
The AMD Instinct MI300 series accelerators are based on the AMD CDNA 3
architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine
learning (ML) workloads. The AMD Instinct MI300 series accelerators are well-suited for extreme scalability and compute performance, running
on everything from individual servers to the worlds largest exascale supercomputers.
With the MI300 series, AMD is introducing the Accelerator Complex Die (XCD), which contains the
GPU computational elements of the processor along with the lower levels of the cache hierarchy.
The following image depicts the structure of a single XCD in the AMD Instinct MI300 accelerator series.
```{figure} ../../data/shared/xcd-sys-arch.png
---
name: mi300-xcd
align: center
---
XCD-level system architecture showing 40 Compute Units, each with 32 KB L1 cache, a Unified Compute System with 4 ACE Compute Accelerators, shared 4MB of L2 cache and an HWS Hardware Scheduler.
```
On the XCD, four Asynchronous Compute Engines (ACEs) send compute shader workgroups to the
Compute Units (CUs). The XCD has 40 CUs: 38 active CUs at the aggregate level and 2 disabled CUs for
yield management. The CUs all share a 4 MB L2 cache that serves to coalesce all memory traffic for the
die. With less than half of the CUs of the AMD Instinct MI200 Series compute die, the AMD CDNA™ 3
XCD die is a smaller building block. However, it uses more advanced packaging and the processor
can include 6 or 8 XCDs for up to 304 CUs, roughly 40% more than MI250X.
The MI300 Series integrate up to 8 vertically stacked XCDs, 8 stacks of
High-Bandwidth Memory 3 (HBM3) and 4 I/O dies (containing system
infrastructure) using the AMD Infinity Fabric™ technology as interconnect.
The Matrix Cores inside the CDNA 3 CUs have significant improvements, emphasizing AI and machine
learning, enhancing throughput of existing data types while adding support for new data types.
CDNA 2 Matrix Cores support FP16 and BF16, while offering INT8 for inference. Compared to MI250X
accelerators, CDNA 3 Matrix Cores triple the performance for FP16 and BF16, while providing a
performance gain of 6.8 times for INT8. FP8 has a performance gain of 16 times compared to FP32,
while TF32 has a gain of 4 times compared to FP32.
```{list-table} Peak-performance capabilities of the MI300X for different data types.
:header-rows: 1
:name: mi300x-perf-table
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
*
- Matrix FP64
- 256
- 163.4
*
- Vector FP64
- 128
- 81.7
*
- Matrix FP32
- 256
- 163.4
*
- Vector FP32
- 256
- 163.4
*
- Vector TF32
- 1024
- 653.7
*
- Matrix FP16
- 2048
- 1307.4
*
- Matrix BF16
- 2048
- 1307.4
*
- Matrix FP8
- 4096
- 2614.9
*
- Matrix INT8
- 4096
- 2614.9
```
The above table summarizes the aggregated peak performance of the AMD Instinct MI300X Open
Compute Platform (OCP) Open Accelerator Modules (OAMs) for different data types and command
processors. The middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction is submitted in each clock
cycle. The third column lists the theoretical peak performance of the OAM. The theoretical aggregated
peak memory bandwidth of the GPU is 5.3 TB per second.
The following image shows the block diagram of the APU (left) and the OAM package (right) both
connected via AMD Infinity Fabric™ network on-chip.
```{figure} ../../data/conceptual/gpu-arch/image008.png
---
name: mi300-arch
alt:
align: center
---
MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, while the MI300X (right) has 8 XCDs.
```
## Node-level architecture
```{figure} ../../data/shared/mi300-node-level-arch.png
---
name: mi300-node
align: center
---
MI300 series node-level architecture showing 8 fully interconnected MI300X OAM modules connected to (optional) PCIEe switches via retimers and HGX connectors.
```
The image above shows the node-level architecture of a system with AMD EPYC processors in a
dual-socket configuration and eight AMD Instinct MI300X accelerators. The MI300X OAMs attach to the
host system via PCIe Gen 5 x16 links (yellow lines). The GPUs are using seven high-bandwidth,
low-latency AMD Infinity Fabric™ links (red lines) to form a fully connected 8-GPU system.
<!---
We need performance data about the P2P communication here.
-->

View File

@@ -1,116 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="GPU isolation techniques">
<meta name="keywords" content="GPU isolation techniques, UUID, universally unique identifier,
environment variables, virtual machines, AMD, ROCm">
</head>
# GPU isolation techniques
Restricting the access of applications to a subset of GPUs, aka isolating
GPUs allows users to hide GPU resources from programs. The programs by default
will only use the "exposed" GPUs ignoring other (hidden) GPUs in the system.
There are multiple ways to achieve isolation of GPUs in the ROCm software stack,
differing in which applications they apply to and the security they provide.
This page serves as an overview of the techniques.
## Environment variables
The runtimes in the ROCm software stack read these environment variables to
select the exposed or default device to present to applications using them.
Environment variables shouldn't be used for isolating untrusted applications,
as an application can reset them before initializing the runtime.
### `ROCR_VISIBLE_DEVICES`
A list of device indices or {abbr}`UUID (universally unique identifier)`s
that will be exposed to applications.
Runtime
: ROCm Software Runtime. Applies to all applications using the user mode ROCm
software stack.
```{code-block} shell
:caption: Example to expose the 1. device and a device based on UUID.
export ROCR_VISIBLE_DEVICES="0,GPU-DEADBEEFDEADBEEF"
```
### `GPU_DEVICE_ORDINAL`
Devices indices exposed to OpenCL and HIP applications.
Runtime
: ROCm Compute Language Runtime (`ROCclr`). Applies to applications and runtimes
using the `ROCclr` abstraction layer including HIP and OpenCL applications.
```{code-block} shell
:caption: Example to expose the 1. and 3. device in the system.
export GPU_DEVICE_ORDINAL="0,2"
```
(hip_visible_devices)=
### `HIP_VISIBLE_DEVICES`
Device indices exposed to HIP applications.
Runtime: HIP runtime. Applies only to applications using HIP on the AMD platform.
```{code-block} shell
:caption: Example to expose the 1. and 3. devices in the system.
export HIP_VISIBLE_DEVICES="0,2"
```
### `CUDA_VISIBLE_DEVICES`
Provided for CUDA compatibility, has the same effect as `HIP_VISIBLE_DEVICES`
on the AMD platform.
Runtime
: HIP or CUDA Runtime. Applies to HIP applications on the AMD or NVIDIA platform
and CUDA applications.
### `OMP_DEFAULT_DEVICE`
Default device used for OpenMP target offloading.
Runtime
: OpenMP Runtime. Applies only to applications using OpenMP offloading.
```{code-block} shell
:caption: Example on setting the default device to the third device.
export OMP_DEFAULT_DEVICE="2"
```
## Docker
Docker uses Linux kernel namespaces to provide isolated environments for
applications. This isolation applies to most devices by default, including
GPUs. To access them in containers explicit access must be granted, please see
{ref}`docker-access-gpus-in-container` for details.
Specifically refer to {ref}`docker-restrict-gpus` on exposing just a subset
of all GPUs.
Docker isolation is more secure than environment variables, and applies
to all programs that use the `amdgpu` kernel module interfaces.
Even programs that don't use the ROCm runtime, like graphics applications
using OpenGL or Vulkan, can only access the GPUs exposed to the container.
## GPU passthrough to virtual machines
Virtual machines achieve the highest level of isolation, because even the kernel
of the virtual machine is isolated from the host. Devices physically installed
in the host system can be passed to the virtual machine using PCIe passthrough.
This allows for using the GPU with a different operating systems like a Windows
guest from a Linux host.
Setting up PCIe passthrough is specific to the hypervisor used. ROCm officially
supports [VMware ESXi](https://www.vmware.com/products/esxi-and-esx.html)
for select GPUs.
<!--
TODO: This should link to a page about virtualization that explains
pass-through and SR-IOV and how-tos for maybe `libvirt` and `VMWare`
-->

View File

@@ -9,59 +9,85 @@ import shutil
import sys
from pathlib import Path
shutil.copy2("../RELEASE.md", "./about/release-notes.md")
shutil.copy2("../CHANGELOG.md", "./release/changelog.md")
ROCM_VERSION = "7.9.0"
GA_DATE = "2025-10-20"
DOCS_DIR = Path(__file__).parent.resolve()
ROOT_DIR = DOCS_DIR.parent
def copy_rtd_file(src_path: Path, dest_path: Path):
dest_path.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src_path, dest_path)
print(f"📁 Copied {src_path}{dest_path}")
# compat_matrix_src = DOCS_DIR / "compatibility" / "compatibility-matrix-historical-6.0.csv" # fmt: skip
# compat_matrix_dest = ROOT_DIR / "_readthedocs" / "html" / "downloads" / "compatibility-matrix-historical-6.0.csv" # fmt: skip
# copy_rtd_file(compat_matrix_src, compat_matrix_dest)
gh_release_path = ROOT_DIR / "RELEASE.md"
rtd_release_path = DOCS_DIR / "about" / "release-notes.md"
copy_rtd_file(gh_release_path, rtd_release_path)
# gh_changelog_path = ROOT_DIR / "CHANGELOG.md"
# rtd_changelog_path = DOCS_DIR / "release" / "changelog.md"
# copy_rtd_file(gh_changelog_path, rtd_changelog_path)
# Mark the consolidated changelog as orphan to prevent Sphinx from warning about missing toctree entries
with open("./release/changelog.md", "r+") as file:
content = file.read()
file.seek(0)
file.write(":orphan:\n" + content)
# with open(rtd_changelog_path, "r+", encoding="utf-8") as file:
# content = file.read()
# file.seek(0)
# file.write(":orphan:\n" + content)
#
# # Replace GitHub-style [!ADMONITION]s with Sphinx-compatible ```{admonition} blocks
# with open(rtd_changelog_path, "r", encoding="utf-8") as file:
# lines = file.readlines()
#
# modified_lines = []
# in_admonition_section = False
#
# # Map for matching the specific admonition type to its corresponding Sphinx markdown syntax
# admonition_types = {
# "> [!NOTE]": "```{note}",
# "> [!TIP]": "```{tip}",
# "> [!IMPORTANT]": "```{important}",
# "> [!WARNING]": "```{warning}",
# "> [!CAUTION]": "```{caution}",
# }
#
# for line in lines:
# if any(line.startswith(k) for k in admonition_types):
# for key in admonition_types:
# if line.startswith(key):
# modified_lines.append(admonition_types[key] + "\n")
# break
# in_admonition_section = True
# elif in_admonition_section:
# if line.strip() == "":
# # If we encounter an empty line, close the admonition section
# modified_lines.append("```\n\n") # Close the admonition block
# in_admonition_section = False
# else:
# modified_lines.append(line.lstrip("> "))
# else:
# modified_lines.append(line)
#
# # In case the file ended while still in a admonition section, close it
# if in_admonition_section:
# modified_lines.append("```")
#
# file.close()
#
# with open(rtd_changelog_path, "w", encoding="utf-8") as file:
# file.writelines(modified_lines)
# Replace GitHub-style [!ADMONITION]s with Sphinx-compatible ```{admonition} blocks
with open("./release/changelog.md", "r") as file:
lines = file.readlines()
modified_lines = []
in_admonition_section = False
# Map for matching the specific admonition type to its corresponding Sphinx markdown syntax
admonition_types = {
'> [!NOTE]': '```{note}',
'> [!TIP]': '```{tip}',
'> [!IMPORTANT]': '```{important}',
'> [!WARNING]': '```{warning}',
'> [!CAUTION]': '```{caution}'
}
for line in lines:
if any(line.startswith(k) for k in admonition_types):
for key in admonition_types:
if(line.startswith(key)):
modified_lines.append(admonition_types[key] + '\n')
break
in_admonition_section = True
elif in_admonition_section:
if line.strip() == '':
# If we encounter an empty line, close the admonition section
modified_lines.append('```\n\n') # Close the admonition block
in_admonition_section = False
else:
modified_lines.append(line.lstrip('> '))
else:
modified_lines.append(line)
# In case the file ended while still in a admonition section, close it
if in_admonition_section:
modified_lines.append('```')
file.close()
with open("./release/changelog.md", 'w') as file:
file.writelines(modified_lines)
os.system("mkdir -p ../_readthedocs/html/downloads")
os.system("cp compatibility/compatibility-matrix-historical-6.0.csv ../_readthedocs/html/downloads/compatibility-matrix-historical-6.0.csv")
# matrix_path = os.path.join("compatibility", "compatibility-matrix-historical-6.0.csv")
# rtd_path = os.path.join("..", "_readthedocs", "html", "downloads")
# if not os.path.exists(rtd_path):
# os.makedirs(rtd_path)
# shutil.copy2(matrix_path, rtd_path)
latex_engine = "xelatex"
latex_elements = {
@@ -74,138 +100,108 @@ latex_elements = {
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
html_context = {}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
project_path = os.path.abspath(".").replace("\\", "/")
project_path = str(DOCS_DIR).replace("\\", "/")
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = "6.4.2"
release = "6.4.2"
copyright = "Copyright (c) %Y Advanced Micro Devices, Inc. All rights reserved."
version = ROCM_VERSION
release = ROCM_VERSION
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "os": ["linux"], "date": "2025-07-21"},
{"file": "release/changelog", "os": ["linux"],},
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
{"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
{"file": "compatibility/ml-compatibility/tensorflow-compatibility", "os": ["linux"]},
{"file": "compatibility/ml-compatibility/jax-compatibility", "os": ["linux"]},
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/install", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/system-health-check", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/train-a-model", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/prerequisite-system-validation", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/scale-model-training", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/megatron-lm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.5", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/pytorch-training", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.5", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/overview", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/fine-tuning-and-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/hugging-face-models", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/llm-inference-frameworks", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/vllm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.4.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.6.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.6.6", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.7.3-20250325", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.3-20250415", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250513", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250521", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.9.0.1-20250605", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.9.0.1-20250702", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/model-quantization", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/optimizing-with-composable-kernel", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/optimizing-triton-kernel", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/profiling-and-debugging", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference-optimization/workload", "os": ["linux"]},
{"file": "how-to/system-optimization/index", "os": ["linux"]},
{"file": "how-to/system-optimization/mi300x", "os": ["linux"]},
{"file": "how-to/system-optimization/mi200", "os": ["linux"]},
{"file": "how-to/system-optimization/mi100", "os": ["linux"]},
{"file": "how-to/system-optimization/w6000-v620", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/index", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/system", "os": ["linux"]},
{"file": "how-to/tuning-guides/mi300x/workload", "os": ["linux"]},
{"file": "how-to/system-debugging", "os": ["linux"]},
{"file": "how-to/gpu-enabled-mpi", "os": ["linux"]},
{"file": "about/release-notes", "date": GA_DATE},
{"file": "rocm-for-ai/xdit-diffusion-inference", "os": ["linux"]},
]
external_toc_path = "./sphinx/_toc.yml"
# Add the _extensions directory to Python's search path
sys.path.append(str(Path(__file__).parent / 'extension'))
# Register Sphinx extensions and static assets
sys.path.append(str(DOCS_DIR / "extension"))
extensions = ["rocm_docs", "sphinx_reredirects", "sphinx_sitemap", "sphinxcontrib.datatemplates", "version-ref", "csv-to-list-table"]
# "sphinx/static/css", "extension/how-to/rocm-for-ai/inference"]
# html_css_files = [
# "rocm_custom.css",
# "rocm_rn.css",
# "dynamic_picker.css",
# "vllm-benchmark.css",
# ]
templates_path = ["extension/rocm_docs_custom/templates", "extension/templates"]
compatibility_matrix_file = str(Path(__file__).parent / 'compatibility/compatibility-matrix-historical-6.0.csv')
extensions = [
"rocm_docs",
"rocm_docs_custom.selector",
"rocm_docs_custom.table",
"rocm_docs_custom.icon",
"sphinxcontrib.datatemplates",
# "sphinx_reredirects",
# "sphinx_sitemap",
# "version-ref",
# "csv-to-list-table",
]
# compatibility_matrix_file = str(
# DOCS_DIR / "compatibility/compatibility-matrix-historical-6.0.csv"
# )
external_projects_current_project = "rocm"
# Uncomment if facing rate limit exceed issue with local build
# external_projects_remote_repository = ""
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")
html_context = {}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}
html_static_path = ["sphinx/static/css", "extension/how-to/rocm-for-ai/inference"]
html_css_files = ["rocm_custom.css", "rocm_rn.css", "vllm-benchmark.css"]
html_theme_options = {
"announcement": "This is ROCm 7.9.0 technology preview release documentation. For the latest production stream release, refer to <a id='rocm-banner' href='https://rocm.docs.amd.com/en/latest/'>ROCm documentation</a>.",
"flavor": "generic",
"header_title": "ROCm™ 7.9.0 Preview",
"header_link": "https://rocm.docs.amd.com/en/7.9.0-preview/index.html",
"version_list_link": "https://rocm.docs.amd.com/en/7.10.0-preview/release/versions.html",
"nav_secondary_items": {
"GitHub": "https://github.com/ROCm/ROCm",
"Community": "https://github.com/ROCm/ROCm/discussions",
"Blogs": "https://rocm.blogs.amd.com/",
"Instinct™ Docs": "https://instinct.docs.amd.com/",
"Support": "https://github.com/ROCm/ROCm/issues/new/choose",
},
"link_main_doc": False,
"secondary_sidebar_items": {
"**": ["page-toc"],
"install/rocm": ["selector-toc2"],
"compatibility/compatibility-matrix": ["selector-toc2"],
}
}
html_title = f"AMD ROCm {ROCM_VERSION} preview"
html_static_path = ["sphinx/static/css", "sphinx/static/js"]
html_css_files = ["vllm-benchmark.css"]
html_js_files = ["vllm-benchmark.js"]
html_title = "ROCm Documentation"
html_theme_options = {"link_main_doc": False}
redirects = {"reference/openmp/openmp": "../../about/compatibility/openmp.html"}
numfig = False
suppress_warnings = ["autosectionlabel.*"]
html_context = {
"project_path" : {project_path},
"gpu_type" : [('AMD Instinct accelerators', 'intrinsic'), ('AMD gfx families', 'gfx'), ('NVIDIA families', 'nvidia') ],
"atomics_type" : [('HW atomics', 'hw-atomics'), ('CAS emulation', 'cas-atomics')],
"pcie_type" : [('No PCIe atomics', 'nopcie'), ('PCIe atomics', 'pcie')],
"memory_type" : [('Device DRAM', 'device-dram'), ('Migratable Host DRAM', 'migratable-host-dram'), ('Pinned Host DRAM', 'pinned-host-dram')],
"granularity_type" : [('Coarse-grained', 'coarse-grained'), ('Fine-grained', 'fine-grained')],
"scope_type" : [('Device', 'device'), ('System', 'system')]
}
# html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")
# html_context = {
# "project_path": {project_path},
# "gpu_type": [
# ("AMD Instinct accelerators", "intrinsic"),
# ("AMD gfx families", "gfx"),
# ("NVIDIA families", "nvidia"),
# ],
# "atomics_type": [("HW atomics", "hw-atomics"), ("CAS emulation", "cas-atomics")],
# "pcie_type": [("No PCIe atomics", "nopcie"), ("PCIe atomics", "pcie")],
# "memory_type": [
# ("Device DRAM", "device-dram"),
# ("Migratable Host DRAM", "migratable-host-dram"),
# ("Pinned Host DRAM", "pinned-host-dram"),
# ],
# "granularity_type": [
# ("Coarse-grained", "coarse-grained"),
# ("Fine-grained", "fine-grained"),
# ],
# "scope_type": [("Device", "device"), ("System", "system")],
# }
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
# temporary settings to speed up docs build for faster iteration
# external_projects_remote_repository = ""
# external_toc_exclude_missing = True

View File

@@ -1,150 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Building ROCm documentation">
<meta name="keywords" content="documentation, Visual Studio Code, GitHub, command line,
AMD, ROCm">
</head>
# Building documentation
## GitHub
If you open a pull request and scroll down to the summary panel,
there is a commit status section. Next to the line
`docs/readthedocs.com:advanced-micro-devices-demo`, there is a `Details` link.
If you click this, it takes you to the Read the Docs build for your pull request.
![GitHub PR commit status](../data/contribute/commit-status.png)
If you don't see this line, click `Show all checks` to get an itemized view.
## Command line
You can build our documentation via the command line using Python.
See the `build.tools.python` setting in the [Read the Docs configuration file](https://github.com/ROCm/ROCm/blob/develop/.readthedocs.yaml) for the Python version used by Read the Docs to build documentation.
See the [Python requirements file](https://github.com/ROCm/ROCm/blob/develop/docs/sphinx/requirements.txt) for Python packages needed to build the documentation.
Use the Python Virtual Environment (`venv`) and run the following commands from the project root:
```sh
python3 -mvenv .venv
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
```
Navigate to `_build/html/index.html` and open this file in a web browser.
## Visual Studio Code
With the help of a few extensions, you can create a productive environment to author and test
documentation locally using Visual Studio (VS) Code. Follow these steps to configure VS Code:
1. Install the required extensions:
* Python: `(ms-python.python)`
* Live Server: `(ritwickdey.LiveServer)`
2. Add the following entries to `.vscode/settings.json`.
```json
{
"liveServer.settings.root": "/.vscode/build/html",
"liveServer.settings.wait": 1000,
"python.terminal.activateEnvInCurrentTerminal": true
}
```
* `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
alongside the `tasks.json` command.
* `liveServer.settings.wait`: Tells the live server to wait with the update in order to give Sphinx time to
regenerate the site contents and not refresh before the build is complete.
* `python.terminal.activateEnvInCurrentTerminal`: Activates the automatic virtual environment, so you
can build the site from the integrated terminal.
3. Add the following tasks to `.vscode/tasks.json`.
```json
{
"version": "2.0.0",
"tasks": [
{
"label": "Build Docs",
"type": "process",
"windows": {
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
},
"command": "${workspaceFolder}/.venv/bin/python3",
"args": [
"-m",
"sphinx",
"-j",
"auto",
"-T",
"-b",
"html",
"-d",
"${workspaceFolder}/.vscode/build/doctrees",
"-D",
"language=en",
"${workspaceFolder}/docs",
"${workspaceFolder}/.vscode/build/html"
],
"problemMatcher": [
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"line": 2,
"severity": 3,
"message": 4
}
},
{
"owner": "sphinx",
"fileLocation": "absolute",
"pattern": {
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
"file": 1,
"severity": 2,
"message": 3
}
}
],
"group": {
"kind": "build",
"isDefault": true
}
}
]
}
```
> Implementation detail: two problem matchers were needed to be defined,
> because VS Code doesn't tolerate some problem information being potentially
> absent. While a single regex could match all types of errors, if a capture
> group remains empty (the line number doesn't show up in all warning/error
> messages) but the `pattern` references said empty capture group, VS Code
> discards the message completely.
4. Configure the Python virtual environment (`venv`).
From the Command Palette, run `Python: Create Environment`. Select `venv` environment and
`docs/sphinx/requirements.txt`.
5. Build the docs.
Launch the default build task using one of the following options:
* A hotkey (the default is `Ctrl+Shift+B`)
* Issuing the `Tasks: Run Build Task` from the Command Palette
6. Open the live preview.
Navigate to the site output within VS Code: right-click on `.vscode/build/html/index.html` and
select `Open with Live Server`. The contents should update on every rebuild without having to
refresh the browser.

View File

@@ -1,77 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Contributing to ROCm">
<meta name="keywords" content="ROCm, contributing, contribute, maintainer, contributor">
</head>
# Contributing to the ROCm documentation
The ROCm documentation, like all of ROCm, is open source and available on GitHub. You can contribute to the ROCm documentation by forking the appropriate repository, making your changes, and opening a pull request.
To provide feedback on the ROCm documentation, including submitting an issue or suggesting a feature, see [Providing feedback about the ROCm documentation](./feedback.md).
## The ROCm repositories
The repositories for ROCm and all ROCm components are available on GitHub.
| Module | Documentation location |
| --- | --- |
| ROCm framework | [https://github.com/ROCm/ROCm/tree/develop/docs](https://github.com/ROCm/ROCm/tree/develop/docs) |
| ROCm installation for Linux | [https://github.com/ROCm/rocm-install-on-linux/tree/develop/docs](https://github.com/ROCm/rocm-install-on-linux/tree/develop/docs) |
| ROCm HIP SDK installation for Windows | [https://github.com/ROCm/rocm-install-on-windows/tree/develop/docs](https://github.com/ROCm/rocm-install-on-windows/tree/develop/docs) |
Individual components have their own repositories with their own documentation in their own `docs` folders.
The sub-folders within the `docs` folders across ROCm are typically structured as follows:
| Sub-folder name | Documentation type |
|-------|----------|
| `install` | Installation instructions, build instructions, and prerequisites |
| `conceptual` | Important concepts |
| `how-to` | How to implement specific use cases |
| `tutorials` | Tutorials |
| `reference` | API references and other reference resources |
## Editing and adding to the documentation
ROCm documentation follows the [Google developer documentation style guide](https://developers.google.com/style/highlights).
Most topics in the ROCm documentation are written in [reStructuredText (rst)](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html), with some topics written in Markdown. Only use reStructuredText when adding new topics. Only use Markdown if the topic you are editing is already in Markdown.
To edit or add to the documentation:
1. Fork the repository you want to add to or edit.
2. Clone your fork locally.
3. Create a new local branch cut from the `develop` branch of the repository.
4. Make your changes to the documentation.
5. Optionally, build the documentation locally before creating a pull request by running the following commands from within the `docs` folder:
```bash
pip3 install -r sphinx/requirements.txt # You only need to run this command once
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
The output files will be located in the `docs/_build` folder. Open `docs/_build/html/index.html` to view the documentation.
For more information on ROCm build tools, see [Documentation toolchain](toolchain.md).
6. Push your changes. A GitHub link will be returned in the output of the `git push` command. Open this link in a browser to create the pull request.
The documentation is built as part of the checks on pull request, along with spell checking and linting. Scroll to the bottom of your pull request to view all the checks.
Verify that the linting and spell checking have passed, and that the documentation was built successfully. New words or acronyms can be added to the [wordlist file](https://github.com/ROCm/rocm-docs-core/blob/develop/.wordlist.txt). The wordlist is subject to approval by the ROCm documentation team.
The Read The Docs build of your pull request can be accessed by clicking on the Details link next to the Read The Docs build check. Verify that your changes are in the build and look as expected.
![The GitHub checks are collapsed by default and can be accessed by clicking on "Show All Checks".](../data/contribute/GitHubCheck-Highlight.png)
![The Read The Docs Build is accessed from the Details link in the Read The Docs check.](../data/contribute/GitHub-ReadThe-Docs-Highlight.png)
Your pull request will be reviewed by a member of the ROCm documentation team.
See the [GitHub documentation](https://docs.github.com/en) for information on how to fork and clone a repository, and how to create and push a local branch.
```{important}
By creating a pull request (PR), you agree to allow your contribution to be licensed under the terms of the
LICENSE.txt file in the corresponding repository. Different repositories can use different licenses.
```

View File

@@ -1,27 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Providing feedback for ROCm documentation">
<meta name="keywords" content="documentation, pull request, GitHub, AMD, ROCm">
</head>
# Providing feedback about the ROCm documentation
Feedback about the ROCm documentation is welcome. You can provide feedback about the ROCm documentation either through GitHub Discussions or GitHub Issues.
## Participating in discussions through GitHub Discussions
You can ask questions, view announcements, suggest new features, and communicate with other members of the community through [GitHub Discussions](https://github.com/ROCm/ROCm/discussions).
## Submitting issues through GitHub Issues
You can submit issues through [GitHub Issues](https://github.com/ROCm/ROCm/issues).
When creating a new issue, follow the following guidelines:
1. Always do a search to see if the same issue already exists. If the issue already exists, upvote it, and comment or post to provide any additional details you might have.
2. If you find an issue that is similar to your issue, log your issue, then add a comment that includes a link to the similar issue, as well as its issue number.
3. Always provide as much information as possible. This helps reduce the time required to reproduce the issue.
After creating your issue, make sure to check it regularly for any requests for additional information.
For information about contributing content to the ROCm documentation, see [Contributing to the ROCm documentation](./contributing.md).

View File

@@ -1,46 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm documentation toolchain">
<meta name="keywords" content="documentation, toolchain, Sphinx, Doxygen, MyST, AMD, ROCm">
</head>
# ROCm documentation toolchain
The ROCm documentation relies on several open source toolchains and sites.
## rocm-docs-core
[rocm-docs-core](https://github.com/ROCm/rocm-docs-core) is an AMD-maintained
project that applies customizations for the ROCm documentation. This project is the tool most ROCm repositories use as part of their documentation build pipeline. It is available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
See the user and developer guides for rocm-docs-core at
{doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
## Sphinx
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator originally used for Python. It is now widely used in the open source community.
### Sphinx External ToC
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html) is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file (`_toc.yml.in`) that contains the table of contents.
### Sphinx-book-theme
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme that defines the base appearance for ROCm documentation. ROCm documentation applies some customization, such as a custom header and footer, on top of the Sphinx Book Theme.
### Sphinx Design
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design functionality. ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts information from in-code comments. It is used for API documentation.
## Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin for integrating Doxygen content.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds and hosts the HTML version of the ROCm documentation.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 81 KiB

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 778 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

View File

@@ -0,0 +1,91 @@
vllm_benchmark:
unified_docker:
latest:
pull_tag: rocm/vllm:rocm6.4.1_vllm_0.10.0_20250812
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.10.0_20250812/images/sha256-4c277ad39af3a8c9feac9b30bf78d439c74d9b4728e788a419d3f1d0c30cacaa
rocm_version: 6.4.1
vllm_version: 0.10.0 (0.10.1.dev395+g340ea86df.rocm641)
pytorch_version: 2.7.0+gitf717b2a
hipblaslt_version: 0.15
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.1 8B
mad_tag: pyt_vllm_llama-3.1-8b
model_repo: meta-llama/Llama-3.1-8B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: float16
- model: Llama 3.1 70B
mad_tag: pyt_vllm_llama-3.1-70b
model_repo: meta-llama/Llama-3.1-70B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
precision: float16
- model: Llama 3.1 405B
mad_tag: pyt_vllm_llama-3.1-405b
model_repo: meta-llama/Llama-3.1-405B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct
precision: float16
- model: Llama 2 70B
mad_tag: pyt_vllm_llama-2-70b
model_repo: meta-llama/Llama-2-70b-chat-hf
url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
precision: float16
- model: Llama 3.1 8B FP8
mad_tag: pyt_vllm_llama-3.1-8b_fp8
model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
precision: float8
- model: Llama 3.1 70B FP8
mad_tag: pyt_vllm_llama-3.1-70b_fp8
model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV
precision: float8
- model: Llama 3.1 405B FP8
mad_tag: pyt_vllm_llama-3.1-405b_fp8
model_repo: amd/Llama-3.1-405B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV
precision: float8
- group: Mistral AI
tag: mistral
models:
- model: Mixtral MoE 8x7B
mad_tag: pyt_vllm_mixtral-8x7b
model_repo: mistralai/Mixtral-8x7B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
precision: float16
- model: Mixtral MoE 8x22B
mad_tag: pyt_vllm_mixtral-8x22b
model_repo: mistralai/Mixtral-8x22B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1
precision: float16
- model: Mixtral MoE 8x7B FP8
mad_tag: pyt_vllm_mixtral-8x7b_fp8
model_repo: amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
precision: float8
- model: Mixtral MoE 8x22B FP8
mad_tag: pyt_vllm_mixtral-8x22b_fp8
model_repo: amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
precision: float8
- group: Qwen
tag: qwen
models:
- model: QwQ-32B
mad_tag: pyt_vllm_qwq-32b
model_repo: Qwen/QwQ-32B
url: https://huggingface.co/Qwen/QwQ-32B
precision: float16
- model: Qwen3 30B A3B
mad_tag: pyt_vllm_qwen3-30b-a3b
model_repo: Qwen/Qwen3-30B-A3B
url: https://huggingface.co/Qwen/Qwen3-30B-A3B
precision: float16
- group: Microsoft Phi
tag: phi
models:
- model: Phi-4
mad_tag: pyt_vllm_phi-4
model_repo: microsoft/phi-4
url: https://huggingface.co/microsoft/phi-4

View File

@@ -0,0 +1,188 @@
dockers:
- pull_tag: rocm/vllm:rocm6.4.1_vllm_0.10.1_20250909
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.10.1_20250909/images/sha256-1113268572e26d59b205792047bea0e61e018e79aeadceba118b7bf23cb3715c
components:
ROCm: 6.4.1
vLLM: 0.10.1 (0.10.1rc2.dev409+g0b6bf6691.rocm641)
PyTorch: 2.7.0+gitf717b2a
hipBLASLt: 0.15
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.1 8B
mad_tag: pyt_vllm_llama-3.1-8b
model_repo: meta-llama/Llama-3.1-8B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: float16
config:
tp: 1
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Llama 3.1 70B
mad_tag: pyt_vllm_llama-3.1-70b
model_repo: meta-llama/Llama-3.1-70B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
precision: float16
config:
tp: 8
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Llama 3.1 405B
mad_tag: pyt_vllm_llama-3.1-405b
model_repo: meta-llama/Llama-3.1-405B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct
precision: float16
config:
tp: 8
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Llama 2 70B
mad_tag: pyt_vllm_llama-2-70b
model_repo: meta-llama/Llama-2-70b-chat-hf
url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
precision: float16
config:
tp: 8
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 4096
max_num_batched_tokens: 4096
max_model_len: 4096
- model: Llama 3.1 8B FP8
mad_tag: pyt_vllm_llama-3.1-8b_fp8
model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
precision: float8
config:
tp: 1
dtype: auto
kv_cache_dtype: fp8
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Llama 3.1 70B FP8
mad_tag: pyt_vllm_llama-3.1-70b_fp8
model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV
precision: float8
config:
tp: 8
dtype: auto
kv_cache_dtype: fp8
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Llama 3.1 405B FP8
mad_tag: pyt_vllm_llama-3.1-405b_fp8
model_repo: amd/Llama-3.1-405B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV
precision: float8
config:
tp: 8
dtype: auto
kv_cache_dtype: fp8
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- group: Mistral AI
tag: mistral
models:
- model: Mixtral MoE 8x7B
mad_tag: pyt_vllm_mixtral-8x7b
model_repo: mistralai/Mixtral-8x7B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
precision: float16
config:
tp: 8
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 32768
max_num_batched_tokens: 32768
max_model_len: 8192
- model: Mixtral MoE 8x22B
mad_tag: pyt_vllm_mixtral-8x22b
model_repo: mistralai/Mixtral-8x22B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1
precision: float16
config:
tp: 8
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 65536
max_num_batched_tokens: 65536
max_model_len: 8192
- model: Mixtral MoE 8x7B FP8
mad_tag: pyt_vllm_mixtral-8x7b_fp8
model_repo: amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
precision: float8
config:
tp: 8
dtype: auto
kv_cache_dtype: fp8
max_seq_len_to_capture: 32768
max_num_batched_tokens: 32768
max_model_len: 8192
- model: Mixtral MoE 8x22B FP8
mad_tag: pyt_vllm_mixtral-8x22b_fp8
model_repo: amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
precision: float8
config:
tp: 8
dtype: auto
kv_cache_dtype: fp8
max_seq_len_to_capture: 65536
max_num_batched_tokens: 65536
max_model_len: 8192
- group: Qwen
tag: qwen
models:
- model: QwQ-32B
mad_tag: pyt_vllm_qwq-32b
model_repo: Qwen/QwQ-32B
url: https://huggingface.co/Qwen/QwQ-32B
precision: float16
config:
tp: 1
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 131072
max_num_batched_tokens: 131072
max_model_len: 8192
- model: Qwen3 30B A3B
mad_tag: pyt_vllm_qwen3-30b-a3b
model_repo: Qwen/Qwen3-30B-A3B
url: https://huggingface.co/Qwen/Qwen3-30B-A3B
precision: float16
config:
tp: 1
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 32768
max_num_batched_tokens: 32768
max_model_len: 8192
- group: Microsoft Phi
tag: phi
models:
- model: Phi-4
mad_tag: pyt_vllm_phi-4
model_repo: microsoft/phi-4
url: https://huggingface.co/microsoft/phi-4
config:
tp: 1
dtype: auto
kv_cache_dtype: auto
max_seq_len_to_capture: 16384
max_num_batched_tokens: 16384
max_model_len: 8192

Some files were not shown because too many files have changed in this diff Show More