Compare commits

..

49 Commits

Author SHA1 Message Date
Pratik Basyal
845520ff77 LLVM- project pointed to github repo (#4505)
* LLVM- project pointed to latest docs

* Replaced docs link with github repo link
2025-03-17 15:50:42 -04:00
Pratik Basyal
8bf394e998 Revert "rocAL added to ROCm 6.1.5 (#4497)" (#4501)
This reverts commit d1d35bd2d7.
2025-03-14 10:32:44 -04:00
alexxu-amd
dee5470f15 Remove Quick Start and Radeon from 6.1.5 landing page (#4500)
* bump rocm-docs-core version to 1.18.1

* remove unnecessary entries from landing page

* revert rocm-docs-core version update to see if it undo the mess in index page

* bump rocm-docs-core to 1.18.1 again
2025-03-14 10:31:38 -04:00
Pratik Basyal
6c6bbd1460 link fixed for ROCr-runtime (#4498) 2025-03-13 18:48:30 -04:00
Pratik Basyal
d1d35bd2d7 rocAL added to ROCm 6.1.5 (#4497)
* rocAL added

* github link updated

* Github release tag removed
2025-03-13 18:48:20 -04:00
alexxu-amd
9ae2f510ff Update versions.md 2025-03-13 16:04:46 -04:00
Pratik Basyal
bf2f24581e ROCm 6.1.5 Release notes and compatibility matrix update (#318)
* 6.1.5 changes updated

* Lint error fixed

* Review feedback added

* Version tabel udpated

* OS note updated

* Initial review feedback incorporated

* Quick fixes

* Minor FIx

* Table Zebra pattern fixed

* Table CSS updated

* Minor update

* CentOS 7.9 support removed

* Quick update

* Native package installation added

* Leo's feedback added

* Link reset to 6.1.5 pre GA
2025-03-13 15:34:33 -04:00
Sam Wu
bee91034ef Update documentation requirements 2024-09-16 10:13:17 -08:00
Jeffrey Novotny
f97066f7af Merge pull request #3566 from amd-jnovotny/peak-tflops-typo-docs612
Fix typo for TFLOPs metric in MI250 architecture page: cherry pick to docs/6.1.2
2024-08-12 13:18:12 -04:00
Jeffrey Novotny
60ed13b1b0 Fix typo for TFLOPs metric in MI250 architecture page 2024-08-12 10:18:38 -04:00
Jeffrey Novotny
0af66d73e8 Merge pull request #3530 from amd-jnovotny/update-llama-link-612
Fix link to meta-llama finetuning recipes
2024-08-07 12:42:18 -04:00
Jeffrey Novotny
8c0b2dede9 Fix link to rocr debug agent (#3535) 2024-08-06 16:43:09 -06:00
Jeffrey Novotny
fd4366cdd3 Fix link to meta-llama finetuning recipes 2024-08-06 15:39:57 -04:00
spolifroni-amd
02b8dc3eb3 Cherry picking removal of email feedback into 6.1.2 (#3491)
* removed all references to the feedback email

* making the linter happy
2024-08-02 11:58:48 -06:00
Peter Park
0dcf8be892 Merge pull request #3450 from peterjunpark/docs/6.1.2
Remove unused pages in /how-to
2024-07-23 02:51:48 -04:00
Peter Jun Park
8cf3ff1936 remove unused pages 2024-07-22 18:07:32 -04:00
Peter Park
d1b9a04ee9 Merge pull request #3449 from peterjunpark/docs/6.1.2
Merge remote-tracking branch 'upstream/roc-6.1.x' into docs/6.1.2
2024-07-22 18:00:41 -04:00
Peter Jun Park
2bd30f8b91 Merge remote-tracking branch 'upstream/roc-6.1.x' into HEAD 2024-07-22 17:48:50 -04:00
Sam Wu
a1518ffa94 Merge develop into roc-6.1.x (#3440)
* Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx (#3396)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2024.2.2 to 2024.7.4 in /docs/sphinx (#3399)

Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4.
- [Commits](https://github.com/certifi/python-certifi/compare/2024.02.02...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* External CI: build hipBLASLt external dependencies (#3405)

* External CI: Increase composable_kernel pipeline time limit (#3407)

* [Changelog/release notes] Fix and add custom templates for autotag script (#3408)

* Update custom templates

* Add custom templates

* Fix custom template for hipfort

* Fix custom template for hipify

* Fix custom template for rvs

* External CI: Change composable_kernel pipeline to build for specific GPUs with tests and examples (#3412)

* increase task time limit

* test building CK for multiple architectures

* Update composable_kernel.yml

* Update composable_kernel.yml

* gfx90a build

* gfx941;gfx1100;gfx1030 build

* hipTensor gfx941 build

* hipTensor gfx941 build

* reduce CK timeout to 100 minutes

* change all gfx90a targets to gfx942

* Bump sphinx-reredirects from 0.1.4 to 0.1.5 in /docs/sphinx (#3419)

Bumps [sphinx-reredirects](https://github.com/documatt/sphinx-reredirects) from 0.1.4 to 0.1.5.
- [Commits](https://github.com/documatt/sphinx-reredirects/compare/v0.1.4...v0.1.5)

---
updated-dependencies:
- dependency-name: sphinx-reredirects
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Removed TransferBench from the tools list (#3421)

* update AI framework image (#3406)

* update AI framework image

* remove old image

* Update system optimization guides headings (#3422)

* update headings to system optimization

* update index

* conv tuning-guides.md to rst

* shorten system optimization landing page

* update conf.py

update toc order

add space

* Update docs/how-to/tuning-guides.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* update keywords

* update intro

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* External CI: move hipBLASLt build directory to ephemeral storage (#3433)

* build hipblaslt in /mnt instead

* rm checkoutref

* remove debug step

* Update using-gpu-sanitizer.md with new known issues (#3423)

* External CI: move hipBLASLt to new large disk pool

* Remove unused custom template for ck (#3438)

* External CI: ROCm nightly builds (#3435)

* ROCm nightly builds

* remove branch trigger, enable develop

* Remove unused configurations in conf.py (#3444)

* External CI: Switch all pipeline GPU_TARGETS to gfx942 (#3443)

* Switch all pipeline gpu targets to gfx942

* Change more pipelines target to gfx942

* set variables for manual testing

* Switch all pipeline gpu targets to gfx942

* Change more pipelines target to gfx942

* set variables for manual testing

* add test pipeline id

* revert test changes

* correct gpu target name

* remove unused flags; change hipSPARSELt target to be gfx942

* Add MI300X tuning guides (#3448)

* Add MI300X tuning guides

Add mi300x doc (pandoc conversion)

fix headings

add metadata

move images to shared/

move images to shared/

convert tuning-guides.md to rst using pandoc

add mi300x to tuning-guides.rst landing page

update h1s, toc, and landing page

fix spelling

fix fmt

format code blocks

add tensilelite imgs

fix formatting

fix formatting some more

fix formatting

more formatting

spelling

remove --enforce-eager note

satisfy spellcheck linter

more spelling

add fixes from hongxia

fix env var in D5

add fixes to PyTorch inductor section

fix

fix

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update docs/how-to/tuning-guides/mi300x.rst

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update 'torch_compile_debug' suggestion based on Hongxia's feedback

fix PyTorch inductor env vars

minor formatting fixes

Apply suggestions from code review

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

Update vllm path

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

disable numfig in Sphinx configuration

fix formatting and capitalization

add words to wordlist

update index

update wordlist

update optimizing-triton-kernel

convert cards to table

fix link in index.md

add @lpaoletti's feedback

Add system tuning guide

add images

add system section

add os settings and sys management

remove pcie=noats recommendation

reorg

add blurb to developer section

impr formatting

remove windows os from tuning guides pages in conf.py

add suggestions from review

fix typo and link

remove os windows from relevant pages in conf

mi300x

add suggestions from review

fix toc

fix index links

reorg

update vLLM vars

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

update vLLM vars

Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>

reorganize

add warnings

add text to system tuning

add filler text on index pages

reorg tuning pages

fix links

fix vars

* rm old pages

fix toc

* add suggestions from review

small change

add more suggestions

rewrite intro

* add 'workload tuning philosophy'

* refactor

* fix broken links

* black format conf.py

* simplify cmd and update doc structure

* add higher-level heading for consistency (mi300x.rst)

* add fixes from review

fix url

add fixes

fix formatting

fix fmt

fix hipBLASLt section

change words

fix tensilelite section

fix

fix

fix fmt

* style guide

* fix some formatting

* satisfy spellcheck linter

* update wordlist

* fix bad conflict resolution

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: danielsu-amd <danielsu@amd.com>
Co-authored-by: alexxu-amd <159800977+alexxu-amd@users.noreply.github.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: b-sumner <brian.sumner@amd.com>
2024-07-22 15:39:48 -06:00
randyh62
f45fdd5d83 Update using-gpu-sanitizer.md with new known issues (#3423) (#3437)
Co-authored-by: b-sumner <brian.sumner@amd.com>
2024-07-18 20:42:36 -07:00
spolifroni-amd
7fb9c6de51 Merge pull request #3424 from spolifroni-amd/sp-cherry-pick-612
Cherry pick into 6.1.2
2024-07-16 16:46:09 -04:00
Peter Park
c77c3fec23 Update system optimization guides headings (#3422)
* update headings to system optimization

* update index

* conv tuning-guides.md to rst

* shorten system optimization landing page

* update conf.py

update toc order

add space

* Update docs/how-to/tuning-guides.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* update keywords

* update intro

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2024-07-16 16:15:17 -04:00
spolifroni-amd
dc1a141468 Removed TransferBench from the tools list (#3421) 2024-07-16 16:15:16 -04:00
Sam Wu
747b672b04 Merge pull request #3394 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-07-03 15:54:52 -06:00
Sam Wu
31ffa6428f Merge pull request #3374 from ROCm/roc-6.1.x
Merg roc-6.1.x into docs/6.1.2
2024-07-02 10:44:43 -06:00
randyh62
086104bb9f remove Magma (#3361) (#3381)
* remove Magma

* missed one
2024-07-02 07:08:33 -07:00
Peter Park
e19b8ee2eb Merge pull request #3369 from peterjunpark/docs/6.1.2
Add fixes to vLLM install and triton kernel optimization (#3366)
2024-06-27 11:45:47 -07:00
Peter Park
ca33838d0c Add fixes to vLLM install and triton kernel optimization (#3366)
* Add fixes to vLLM install and triton kernel optimization

* Update TGI how-to

remove extra step in TGI
2024-06-27 14:32:45 -04:00
randyh62
c66ddc55b9 added ROCm Core and AMD SMI (#3348) (#3349)
* added ROCm Core and AMD SMI

* fix URLs
2024-06-21 17:11:16 -07:00
Peter Park
1281e5b145 Merge pull request #3347 from peterjunpark/docs/6.1.2
reorder toc (#3346)
2024-06-21 16:10:23 -07:00
Peter Park
c706f689a0 reorder toc (#3346) 2024-06-21 18:54:44 -04:00
Peter Park
feaacde707 Merge pull request #3344 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-21 15:38:22 -07:00
randyh62
35f6429d1a license information updated (#3339) (#3340)
* license information updated

* Young's comments

* Sam's comment
2024-06-21 09:45:00 -07:00
Peter Park
01bcf5e82b Merge pull request #3336 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-19 12:52:35 -07:00
Peter Park
8c0ecf7dfd Merge pull request #3330 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-18 19:22:15 -07:00
randyh62
500c455094 remove nvcc (#3313) (#3320)
* remove nvcc

* Update CHANGELOG to match 6.0.0 template

---------

Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-06-18 17:25:51 -07:00
Peter Park
e34d49bea5 Merge pull request #3319 from peterjunpark/docs/6.1.2
Add Radeon PRO dual slot to hw specs (#3318)
2024-06-18 12:34:34 -07:00
Peter Park
40674aac9c Add Radeon PRO dual slot to hw specs (#3318) 2024-06-18 15:28:30 -04:00
Peter Park
3c6b9df117 Merge pull request #3310 from peterjunpark/docs/6.1.2
Update link to ROCr Debug Agent to docs portal (#3303)
2024-06-17 10:34:30 -07:00
Peter Park
e6b9ed6dca Update link to ROCr Debug Agent to docs portal (#3303)
* Fix link to debug agent in what-is-rocm

* ROCm --> ROCR

add index

* ROCR --> ROCr

* Change ROCm Debug Agent to ROCr Debug Agent in docs
2024-06-17 11:53:47 -04:00
srawat
f3d6e6b561 Merge pull request #3294 from SwRaw/SR_6.1.2
Update link to command-line argument reference (#3270)
2024-06-13 22:28:04 +05:30
Sam Wu
8e701689d2 Merge pull request #3267 from ROCm/roc-6.1.x
Merge roc-6.1.x into docs/6.1.2
2024-06-13 10:05:13 -06:00
Jeffrey Novotny
cb3dee5d07 Merge pull request #3296 from amd-jnovotny/port-aomp-fix
Port aomp fix
2024-06-13 11:37:35 -04:00
Jeffrey Novotny
c61662dadc Remove AOMP from compatibility matrix (#3289) 2024-06-13 11:30:42 -04:00
srawat
bbe495867e Update link to command-line argument reference (#3270)
* Added deleted sections to openmp.md and other improvements

* Update openmp.md
2024-06-13 15:31:30 +05:30
randyh62
c08af3190f update quarantine (#3284) 2024-06-12 09:34:49 -07:00
Istvan Kiss
b69a9c7b97 Update docs/conceptual/setting-cus.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2024-06-12 17:42:18 +02:00
Peter Park
bcae17a4b5 Merge pull request #3283 from peterjunpark/docs/6.1.2
Remove aomp from What is ROCm? page (#3282)
2024-06-11 11:05:41 -07:00
Peter Park
9140ae5bee Remove aomp from What is ROCm? page (#3282) 2024-06-11 11:48:08 -04:00
162 changed files with 3339 additions and 4191 deletions

View File

@@ -19,7 +19,6 @@ parameters:
type: object
default:
- clr
- hipRAND
- llvm-project
- rocBLAS
- rocm-cmake
@@ -27,7 +26,6 @@ parameters:
- rocminfo
- rocprofiler-register
- ROCR-Runtime
- rocRAND
- ROCT-Thunk-Interface
jobs:

View File

@@ -31,7 +31,7 @@ parameters:
jobs:
- job: hipBLASLt
timeoutInMinutes: 300
timeoutInMinutes: 100
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -95,8 +95,12 @@ jobs:
inputs:
targetType: inline
script: ln -s $(Agent.BuildDirectory)/rocm/llvm $(Agent.BuildDirectory)/rocm/lib/llvm
- script: sudo chmod 777 /mnt
displayName: 'Set permissions for /mnt'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
cmakeBuildDir: /mnt/build
cmakeSourceDir: $(Pipeline.Workspace)/s
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++

View File

@@ -11,7 +11,6 @@ parameters:
- cmake
- ninja-build
- libboost-program-options-dev
- libdrm-dev
- libgtest-dev
- libfftw3-dev
- python3-pip

View File

@@ -37,12 +37,6 @@ jobs:
vmImage: ${{ variables.BASE_BUILD_POOL }}
workspace:
clean: all
strategy:
matrix:
gfx942:
JOB_GPU_TARGET: gfx942
gfx90a:
JOB_GPU_TARGET: gfx90a
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
@@ -71,8 +65,6 @@ jobs:
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_MODULE_PATH=$(Agent.BuildDirectory)/rocm/lib/cmake/hip
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DGPU_TARGETS=$(JOB_GPU_TARGET)
-DGPU_TARGETS=gfx942
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
gpuTarget: $(JOB_GPU_TARGET)

View File

@@ -1,252 +0,0 @@
parameters:
# ubuntu near-equivalent list of yum installs in https://github.com/ROCm/ROCm-docker/blob/master/dev/Dockerfile-centos-7-complete
# plus additional packages found through iterative testing of pipeline
- name: aptPackages
type: object
default:
- build-essential
- git
- ninja-build
- openjdk-8-jdk
- ca-certificates
- bc
- bridge-utils
- cmake
- devscripts
- dkms
- doxygen
- libdpkg-dev
- libdpkg-perl
- libelf-dev
- python3-dev
- python3-pip
- python3-venv
- wget
- ncurses-base
- libncurses-dev
- numactl
- libnuma-dev
- libssh-dev
- libunwind-dev
- llvm-dev
- libpth-dev
- qemu-kvm
- re2c
- subversion
- fakeroot
- autoconf
- libgomp1
- libtinfo-dev
- libcholmod3
- libsuitesparseconfig5
- libstdc++-12-dev
- python-is-python3
- gfortran
- libgfortran5
- liblapack3
- libblas3
- libquadmath0
- libmetis5
- libamd2
- libcamd2
- libcolamd2
- libccolamd2
- libdrm-amdgpu1
- ccache
- zip
- name: pipModules
type: object
default:
- astunparse
- expecttest!=0.2.0
- hypothesis
- numpy
- psutil
- pyyaml
- requests
- setuptools
- types-dataclasses
- typing-extensions>=4.8.0
- sympy>=1.13.0
- filelock
- networkx
- jinja2
- fsspec
- lintrunner
- ninja
- packaging
- optree>=0.12.0
# list from https://github.com/pytorch/builder/blob/main/manywheel/build_rocm.sh
- name: rocmDependencies
type: object
default:
- rocminfo
- MIOpen
- clr
- hipBLAS
- hipFFT
- hipRAND
- hipSOLVER
- hipSPARSE
- ROCR-Runtime
- llvm-project
- rccl
- rocBLAS
- rocFFT
- rocm_smi_lib
- rocRAND
- rocSOLVER
- rocSPARSE
- roctracer
- hipBLASLt
- rocprofiler-register
- rocm-core
- rocPRIM
# below are additional dependencies not called out by build script, but throw errors during cmake
- hipCUB
- rocThrust
trigger: none
pr: none
schedules:
- cron: '30 7 * * *'
displayName: nightly pytorch
branches:
include:
- develop
always: true
jobs:
- job: pytorch
timeoutInMinutes: 120
variables:
- group: common
- template: /.azuredevops/variables-global.yml
# various flags/parameters expected by bash scripts in pytorch builder repo
- name: ROCM_VERSION
value: 6.3.0
- name: PYTORCH_ROCM_ARCH
value: gfx942
- name: GPU_TARGET
value: gfx942
- name: ROCM_PATH
value: /opt/rocm
- name: DESIRED_CUDA
value: 6.3.0
- name: MKLROOT
value: /opt/intel
- name: AOTRITON_INSTALLED_PREFIX
value: /opt/rocm/aotriton
- name: DESIRED_PYTHON
value: 3.10
- name: PYTORCH_ROOT
value: $(Build.SourcesDirectory)/pytorch
- name: CMAKE_ARGS
value: -GNinja
- name: DESIRED_DEVTOOLSET
value: cxx11-abi
pool: ${{ variables.ULTRA_BUILD_POOL }}
workspace:
clean: all
steps:
# copy environment setup from https://github.com/pytorch/builder/blob/main/manywheel/Dockerfile
# but instead of centos, use ubuntu environment
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
# wheel install location different on azure agent compared to where wheel is assumed to be installed on upstream script
- task: Bash@3
displayName: wheel install path symlink
inputs:
targetType: inline
script: |
sudo mkdir -p /opt/python/cp310-cp310/lib/python3.10
sudo ln -s /usr/local/lib/python3.10/dist-packages /opt/python/cp310-cp310/lib/python3.10/site-packages
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
dependencyList: ${{ parameters.rocmDependencies }}
dependencySource: staging
- task: Bash@3
displayName: ROCm symbolic links
inputs:
targetType: inline
script: |
sudo ln -s $(Agent.BuildDirectory)/rocm /opt/rocm
sudo ln -s $(Agent.BuildDirectory)/rocm/llvm $(Agent.BuildDirectory)/rocm/lib/llvm
- checkout: self
- task: Bash@3
displayName: git clone pytorch builder
inputs:
targetType: inline
script: git clone https://github.com/pytorch/builder.git --depth=1 --recurse-submodules
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: git clone upstream pytorch
inputs:
targetType: inline
script: git clone https://github.com/pytorch/pytorch.git --depth=1 --recurse-submodules
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: Patch out forced GPU testing block in pytorch build script
inputs:
targetType: inline
script: git apply $(Build.SourcesDirectory)/.azuredevops/patches/pytorch/0001-ROCm-CI-patches.patch
workingDirectory: $(Build.SourcesDirectory)/builder
- task: Bash@3
displayName: Install patchelf
inputs:
targetType: inline
script: |
sudo bash pytorch/.ci/docker/common/install_patchelf.sh
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: Install mkl dependency for magma
inputs:
targetType: inline
script: |
sudo bash pytorch/.ci/docker/common/install_mkl.sh
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: Install rocm drm
inputs:
targetType: inline
script: |
sudo bash pytorch/.ci/docker/common/install_rocm_drm.sh
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: Install rocm magma
inputs:
targetType: inline
script: |
sudo PYTORCH_ROCM_ARCH=$(PYTORCH_ROCM_ARCH) MKLROOT=$(MKLROOT) bash pytorch/.ci/docker/common/install_rocm_magma.sh
workingDirectory: $(Build.SourcesDirectory)
- task: Bash@3
displayName: Install AOTriton Shared Library
inputs:
targetType: inline
script: |
sudo bash ./install_aotriton.sh /opt/rocm
workingDirectory: $(Build.SourcesDirectory)/pytorch/.ci/docker/common
- task: Bash@3
displayName: Run ROCm Build Script
inputs:
targetType: inline
script: >-
sudo
DESIRED_CUDA=$(DESIRED_CUDA)
PYTORCH_ROCM_ARCH=$(PYTORCH_ROCM_ARCH)
DESIRED_PYTHON=$(DESIRED_PYTHON)
PYTORCH_ROOT=$(PYTORCH_ROOT)
CMAKE_ARGS=$(CMAKE_ARGS)
AOTRITON_INSTALLED_PREFIX=$(AOTRITON_INSTALLED_PREFIX)
DESIRED_DEVTOOLSET=$(DESIRED_DEVTOOLSET)
bash ./manywheel/build_rocm.sh
workingDirectory: $(Build.SourcesDirectory)/builder
- task: PublishPipelineArtifact@1
displayName: 'ROCm pytorch wheel file Publish'
retryCountOnTaskFailure: 3
inputs:
targetPath: /remote/wheelhouserocm$(ROCM_VERSION)

View File

@@ -1,40 +0,0 @@
From b2d3c88f7a8b179e814e72f76f27e25c82659200 Mon Sep 17 00:00:00 2001
From: Joseph Macaranas <Joseph.Macaranas@amd.com>
Date: Tue, 30 Jul 2024 05:43:06 -0400
Subject: [PATCH] ROCm CI patches
---
manywheel/build_common.sh | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/manywheel/build_common.sh b/manywheel/build_common.sh
index 08ca924..52c468f 100644
--- a/manywheel/build_common.sh
+++ b/manywheel/build_common.sh
@@ -475,11 +475,9 @@ if [[ -z "$BUILD_PYTHONLESS" ]]; then
fi
pip uninstall -y "$TORCH_PACKAGE_NAME"
-
if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
pip install "$TORCH_NO_PYTHON_PACKAGE_NAME" --no-index -f /$WHEELHOUSE_DIR --no-dependencies -v
fi
-
pip install "$TORCH_PACKAGE_NAME" --no-index -f /$WHEELHOUSE_DIR --no-dependencies -v
# Print info on the libraries installed in this wheel
@@ -491,11 +489,4 @@ if [[ -z "$BUILD_PYTHONLESS" ]]; then
ldd "$installed_lib" || true
done
- # Run the tests
- echo "$(date) :: Running tests"
- pushd "$PYTORCH_ROOT"
- LD_LIBRARY_PATH=/usr/local/nvidia/lib64 \
- "${SOURCE_DIR}/../run_tests.sh" manywheel "${py_majmin}" "$DESIRED_CUDA"
- popd
- echo "$(date) :: Finished tests"
fi
--
2.44.0.windows.1

View File

@@ -9,15 +9,9 @@ parameters:
- name: useDefaultBranch
type: boolean
default: true
- name: latestFromBranch
type: boolean
default: true
- name: extractToMnt
type: boolean
default: false
- name: fileFilter
type: string
default: ''
- name: defaultBranchList
type: object
default:
@@ -27,7 +21,7 @@ parameters:
aomp: aomp-dev
clr: develop
composable_kernel: develop
half: rocm
half: master
HIP: develop
hipBLAS: develop
hipBLASLt: develop
@@ -48,10 +42,10 @@ parameters:
rocAL: develop
rocALUTION: develop
rocBLAS: develop
ROCdbgapi : amd-staging
ROCdbgapi : amd-master
rocDecode: develop
rocFFT: develop
ROCgdb: amd-staging
rocgdb: amd-staging
rocm-cmake: develop
rocm-core: master
rocm-examples: develop
@@ -73,6 +67,10 @@ parameters:
roctracer: amd-master
rocWMMA: develop
rpp: master
- name: componentsFailureOkay
type: object
default:
- rocm-cmake
# BELOW REQUIRED IF useDefaultBranch false
- name: branchName
type: string
@@ -86,15 +84,11 @@ steps:
project: ROCm-CI
definition: ${{ parameters.pipelineId }}
specificBuildWithTriggering: true
itemPattern: '**/*${{ parameters.fileFilter }}*'
${{ if eq(parameters.latestFromBranch, true) }}:
${{ if notIn(parameters.componentName, 'aomp', 'clr', 'rocMLIR') }}: # remove this once these pipelines are functional + up-to-date
buildVersionToDownload: latestFromBranch # default is 'latest'
${{ if eq(parameters.useDefaultBranch, true) }}:
branchName: refs/heads/${{ parameters.defaultBranchList[parameters.componentName] }}
branchName: ${{ parameters.defaultBranchList[parameters.componentName] }}
${{ else }}:
branchName: ${{ parameters.branchName }}
${{ if in(parameters.componentName, 'rocm-cmake') }}:
${{ if in(parameters.componentName, parameters.componentsFailureOkay) }}:
allowPartiallySucceededBuilds: true
targetPath: '$(Pipeline.Workspace)/d'
- task: ExtractFiles@1

View File

@@ -9,9 +9,6 @@ parameters:
- name: publish
type: boolean
default: true
- name: gpuTarget
type: string
default: ''
steps:
- task: ArchiveFiles@2
@@ -20,7 +17,7 @@ steps:
includeRootFolder: false
archiveType: 'tar'
tarCompression: 'gz'
archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.DefinitionName)_$(Build.SourceBranchName)_$(Build.BuildId)_$(Build.BuildNumber)_ubuntu2204_${{ parameters.artifactName }}_${{ parameters.gpuTarget }}.tar.gz'
archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.DefinitionName)_$(Build.SourceBranchName)_$(Build.BuildId)_$(Build.BuildNumber)_ubuntu2204_${{ parameters.artifactName }}.tar.gz'
- task: DeleteFiles@1
displayName: 'Cleanup Staging Area'
inputs:

View File

@@ -21,9 +21,6 @@ parameters:
- name: fixedComponentName
type: string
default: ''
- name: latestFromBranch
type: boolean
default: true
# match case of the repo in this object for the left side of the maps
# should not need to replace these parameters
- name: stagingPipelineIdentifiers
@@ -144,51 +141,25 @@ parameters:
steps:
# assuming artifact-download.yml template file in same directory
# for the case where rocm dependency item in list has a colon (:)
# assume it is of the format of componentName:fileFilter
# fileFilter could contain both a subcomponent name or gpu name separated by asterisks
# e.g., gfx942 to only download artifacts from component for this gpu if applicable
- ${{ each dependency in parameters.dependencyList }}:
- ${{ if contains(dependency, ':') }}:
- ${{ if eq(parameters.dependencySource, 'staging') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ split(dependency, ':')[0] }}
pipelineId: ${{ parameters.stagingPipelineIdentifiers[split(dependency, ':')[0]] }}
fileFilter: ${{ split(dependency, ':')[1] }}
latestFromBranch: ${{ parameters.latestFromBranch }}
extractToMnt: ${{ parameters.extractToMnt }}
- ${{ if eq(parameters.dependencySource, 'tag-builds') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ split(dependency, ':')[0] }}
pipelineId: ${{ parameters.taggedPipelineIdentifiers[split(dependency, ':')[0]] }}
fileFilter: ${{ split(dependency, ':')[1] }}
latestFromBranch: false
extractToMnt: ${{ parameters.extractToMnt }}
# no colon (:) found in this item in the list
- ${{ else }}:
- ${{ if eq(parameters.dependencySource, 'staging') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.stagingPipelineIdentifiers[dependency] }}
latestFromBranch: ${{ parameters.latestFromBranch }}
extractToMnt: ${{ parameters.extractToMnt }}
- ${{ if eq(parameters.dependencySource, 'tag-builds') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.taggedPipelineIdentifiers[dependency] }}
latestFromBranch: false
extractToMnt: ${{ parameters.extractToMnt }}
- ${{ if eq(parameters.dependencySource, 'staging') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.stagingPipelineIdentifiers[dependency] }}
extractToMnt: ${{ parameters.extractToMnt }}
- ${{ if eq(parameters.dependencySource, 'tag-builds') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ dependency }}
pipelineId: ${{ parameters.taggedPipelineIdentifiers[dependency] }}
extractToMnt: ${{ parameters.extractToMnt }}
# fixed case only accepts one component at a time, so no array input
- ${{ if eq(parameters.dependencySource, 'fixed') }}:
- template: artifact-download.yml
parameters:
componentName: ${{ parameters.fixedComponentName }}
pipelineId: ${{ parameters.fixedPipelineIdentifier }}
latestFromBranch: false
extractToMnt: ${{ parameters.extractToMnt }}
- task: Bash@3
displayName: 'list downloaded ROCm files'

View File

@@ -12,7 +12,7 @@ steps:
inputs:
targetType: inline
script: python3 --version
- script: pip list -v
- script: pip list
displayName: 'list python packages'
- task: DeleteFiles@1
displayName: 'Cleanup checkout space'

183
.gitmodules vendored
View File

@@ -1,183 +0,0 @@
[submodule "libs/ROCK-Kernel-Driver"]
path = libs/ROCK-Kernel-Driver
url = ../ROCK-Kernel-Driver
[submodule "libs/ROCT-Thunk-Interface"]
path = libs/ROCT-Thunk-Interface
url = ../ROCT-Thunk-Interface
[submodule "libs/ROCR-Runtime"]
path = libs/ROCR-Runtime
url = ../ROCR-Runtime
[submodule "libs/amdsmi"]
path = libs/amdsmi
url = ../amdsmi
[submodule "libs/rocm_smi_lib"]
path = libs/rocm_smi_lib
url = ../rocm_smi_lib
[submodule "libs/rocm-core"]
path = libs/rocm-core
url = ../rocm-core
[submodule "libs/rocm-cmake"]
path = libs/rocm-cmake
url = ../rocm-cmake
[submodule "libs/rocminfo"]
path = libs/rocminfo
url = ../rocminfo
[submodule "libs/rocm_bandwidth_test"]
path = libs/rocm_bandwidth_test
url = ../rocm_bandwidth_test
[submodule "libs/rocprofiler"]
path = libs/rocprofiler
url = ../rocprofiler
[submodule "libs/roctracer"]
path = libs/roctracer
url = ../roctracer
[submodule "libs/rdc"]
path = libs/rdc
url = ../rdc
[submodule "libs/HIP"]
path = libs/HIP
url = ../HIP
[submodule "libs/clr"]
path = libs/clr
url = ../clr
[submodule "libs/hipother"]
path = libs/hipother
url = ../hipother
[submodule "libs/HIPIFY"]
path = libs/HIPIFY
url = ../HIPIFY
[submodule "libs/HIPCC"]
path = libs/HIPCC
url = ../HIPCC
[submodule "libs/llvm-project"]
path = libs/llvm-project
url = ../llvm-project
[submodule "libs/ROCm-Device-Libs"]
path = libs/ROCm-Device-Libs
url = ../ROCm-Device-Libs
[submodule "libs/ROCm-CompilerSupport"]
path = libs/ROCm-CompilerSupport
url = ../ROCm-CompilerSupport
[submodule "libs/half"]
path = libs/half
url = ../half
[submodule "libs/ROCgdb"]
path = libs/ROCgdb
url = ../ROCgdb
[submodule "libs/ROCdbgapi"]
path = libs/ROCdbgapi
url = ../ROCdbgapi
[submodule "libs/rocr_debug_agent"]
path = libs/rocr_debug_agent
url = ../rocr_debug_agent
[submodule "libs/rocBLAS"]
path = libs/rocBLAS
url = ../rocBLAS
[submodule "libs/Tensile"]
path = libs/Tensile
url = ../Tensile
[submodule "libs/hipTensor"]
path = libs/hipTensor
url = ../hipTensor
[submodule "libs/hipBLAS"]
path = libs/hipBLAS
url = ../hipBLAS
[submodule "libs/hipBLASLt"]
path = libs/hipBLASLt
url = ../hipBLASLt
[submodule "libs/rocFFT"]
path = libs/rocFFT
url = ../rocFFT
[submodule "libs/hipFFT"]
path = libs/hipFFT
url = ../hipFFT
[submodule "libs/rocRAND"]
path = libs/rocRAND
url = ../rocRAND
[submodule "libs/hipRAND"]
path = libs/hipRAND
url = ../hipRAND
[submodule "libs/rocSPARSE"]
path = libs/rocSPARSE
url = ../rocSPARSE
[submodule "libs/hipSPARSELt"]
path = libs/hipSPARSELt
url = ../hipSPARSELt
[submodule "libs/rocSOLVER"]
path = libs/rocSOLVER
url = ../rocSOLVER
[submodule "libs/hipSOLVER"]
path = libs/hipSOLVER
url = ../hipSOLVER
[submodule "libs/hipSPARSE"]
path = libs/hipSPARSE
url = ../hipSPARSE
[submodule "libs/rocALUTION"]
path = libs/rocALUTION
url = ../rocALUTION
[submodule "libs/rocThrust"]
path = libs/rocThrust
url = ../rocThrust
[submodule "libs/hipCUB"]
path = libs/hipCUB
url = ../hipCUB
[submodule "libs/rocPRIM"]
path = libs/rocPRIM
url = ../rocPRIM
[submodule "libs/rocWMMA"]
path = libs/rocWMMA
url = ../rocWMMA
[submodule "libs/rccl"]
path = libs/rccl
url = ../rccl
[submodule "libs/MIOpen"]
path = libs/MIOpen
url = ../MIOpen
[submodule "libs/composable_kernel"]
path = libs/composable_kernel
url = ../composable_kernel
[submodule "libs/MIVisionX"]
path = libs/MIVisionX
url = ../MIVisionX
[submodule "libs/rpp"]
path = libs/rpp
url = ../rpp
[submodule "libs/hipfort"]
path = libs/hipfort
url = ../hipfort
[submodule "libs/AMDMIGraphX"]
path = libs/AMDMIGraphX
url = ../AMDMIGraphX
[submodule "libs/ROCmValidationSuite"]
path = libs/ROCmValidationSuite
url = ../ROCmValidationSuite
[submodule "libs/openmp-extras/aomp"]
path = libs/openmp-extras/aomp
url = ../aomp
[submodule "libs/openmp-extras/aomp-extras"]
path = libs/openmp-extras/aomp-extras
url = ../aomp-extras
[submodule "libs/openmp-extras/flang"]
path = libs/openmp-extras/flang
url = ../flang
[submodule "libs/rocDecode"]
path = libs/rocDecode
url = ../rocDecode
[submodule "libs/omnitrace"]
path = libs/omnitrace
url = ../omnitrace
[submodule "libs/omniperf"]
path = libs/omniperf
url = ../omniperf
[submodule "libs/rocprofiler-sdk"]
path = libs/rocprofiler-sdk
url = ../rocprofiler-sdk
[submodule "libs/rocm-examples"]
path = libs/rocm-examples
url = ../rocm-examples
[submodule "libs/rocPyDecode"]
path = libs/rocPyDecode
url = ../rocPyDecode
[submodule "libs/rocAL"]
path = libs/rocAL
url = ../rocAL

View File

@@ -3,20 +3,19 @@
version: 2
sphinx:
configuration: docs/conf.py
formats: [htmlzip]
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "gfortran" # For pre-processing fortran sources
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: []

View File

@@ -26,12 +26,11 @@ ATI
AddressSanitizer
AlexNet
Arb
Autocast
BARs
BLAS
BMC
BitCode
Blit
Blockwise
Bluefield
Bootloader
CCD
@@ -68,7 +67,6 @@ CommonMark
Concretized
Conda
ConnectX
CuPy
DDR
DF
DGEMM
@@ -88,7 +86,6 @@ DataLoader
DataParallel
DeepSpeed
Dependabot
Deprecations
DevCap
Dockerfile
Doxygen
@@ -96,7 +93,6 @@ ELMo
ENDPGM
EPYC
ESXi
EoS
FFT
FFTs
FFmpeg
@@ -147,7 +143,6 @@ HPCG
HPE
HPL
HSA
HW
HWE
HWS
Haswell
@@ -171,14 +166,12 @@ ImageNet
InfiniBand
Inlines
IntelliSense
Interop
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KFDTest
KiB
KV
KVM
@@ -215,7 +208,6 @@ MVFFR
Makefile
Makefiles
Matplotlib
Matrox
Megatrends
Megatron
Mellanox
@@ -250,7 +242,6 @@ OAMs
OCP
OEM
OFED
OMM
OMP
OMPI
OMPT
@@ -266,7 +257,6 @@ OpenMP
OpenMPI
OpenSSL
OpenVX
OpenXLA
PCC
PCI
PCIe
@@ -283,21 +273,16 @@ PerfDb
Perfetto
PipelineParallel
PnP
PowerEdge
PowerShell
PyPi
PyTorch
Qcycles
RAII
RAS
RCCL
RDC
RDMA
RDNA
README
RHEL
RNN
RNNs
ROC
ROCProfiler
ROCTracer
@@ -309,7 +294,6 @@ ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCprofiler
ROCr
RST
RW
@@ -384,7 +368,6 @@ UC
UCC
UCX
UIF
UMC
USM
UTCL
UTIL
@@ -444,10 +427,8 @@ backends
benchmarking
bfloat
bilinear
bitcode
bitsandbytes
blit
bootloader
boson
bosons
buildable
@@ -470,10 +451,8 @@ composable
concretization
config
conformant
constructible
convolutional
convolves
copyable
cpp
csn
cuBLAS
@@ -495,8 +474,6 @@ denoise
denoised
denoises
denormalize
dequantization
dequantizes
deserializers
detections
dev
@@ -508,9 +485,8 @@ distro
el
embeddings
enablement
encodings
endpgm
enqueue
encodings
env
epilog
etcetera
@@ -521,7 +497,6 @@ ffmpeg
filesystem
fortran
fp
gRPC
galb
gcc
gdb
@@ -529,7 +504,6 @@ gfortran
gfx
githooks
github
globals
gnupg
grayscale
gzip
@@ -559,7 +533,6 @@ hpp
hsa
hsakmt
hyperparameter
iDRAC
ib_core
inband
incrementing
@@ -570,7 +543,6 @@ init
initializer
inlining
installable
interop
interprocedural
intra
invariants
@@ -598,7 +570,6 @@ mivisionx
mkdir
mlirmiopen
mtypes
mutex
mvffr
namespace
namespaces
@@ -621,35 +592,25 @@ pragma
pre
prebuilt
precompiled
preconditioner
preconfigured
prefetch
prefetchable
prefill
prefills
preloaded
preprocess
preprocessed
preprocessing
preprocessor
prequantized
prerequisites
profiler
profilers
protobuf
pseudorandom
py
quantile
quantizer
quasirandom
queueing
rccl
rdc
reStructuredText
redirections
refactorization
reformats
repo
repos
representativeness
req
@@ -661,12 +622,10 @@ roc
rocAL
rocALUTION
rocBLAS
rocDecode
rocFFT
rocLIB
rocMLIR
rocPRIM
rocPyDecode
rocRAND
rocSOLVER
rocSPARSE
@@ -704,14 +663,11 @@ spack
src
stochastically
strided
subcommand
subdirectory
subexpression
subfolder
subfolders
submodule
supercomputing
symlink
td
tensorfloat
th
@@ -731,7 +687,6 @@ txt
uarch
uncached
uncorrectable
unhandled
uninstallation
unsqueeze
unstacking
@@ -753,13 +708,10 @@ vectorized
vectorizer
vectorizes
vjxb
voxel
walkthrough
walkthroughs
watchpoints
wavefront
wavefronts
whitespace
whitespaces
workgroup
workgroups

View File

@@ -21,7 +21,19 @@ source software compilers, debuggers, and libraries. ROCm is fully integrated in
## Getting the ROCm Source Code
AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains submodules that point to the correct versions of each of the ROCm components. They can be found in the `/libs` directory of the ROCm repository.
AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains a repo manifest file called [default.xml](./default.xml). You can use this manifest file to download the source code for ROCm software.
### Installing the repo tool
The repo tool from Google allows you to manage multiple git repositories simultaneously. Run the following commands to install the repo tool:
```bash
mkdir -p ~/bin/
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
chmod a+x ~/bin/repo
```
**Note:** The ```~/bin/``` folder is used as an example. You can specify a different folder to install the repo tool into if you desire.
### Installing git-lfs
@@ -33,12 +45,17 @@ sudo apt-get install git-lfs
### Downloading the ROCm source code
The following example shows how to download the ROCm source from this repository.
The following example shows how to use the repo tool to download the ROCm source code. If you choose a directory other than ~/bin/ to install the repo tool, you must use that chosen directory in the code as shown below:
```bash
git clone https://github.com/ROCm/ROCm -b amd/dgaliffi/submodules-6-2-0 --recurse-submodules
mkdir -p ~/ROCm/
cd ~/ROCm/
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.0.x
~/bin/repo sync
```
**Note:** Using this sample code will cause the repo tool to download the open source code associated with the specified ROCm release. Ensure that you have ssh-keys configured on your machine for your GitHub ID prior to the download as explained at [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh).
## Building the ROCm source code
Each ROCm component repository contains directions for building that component, such as the rocSPARSE documentation [Installation and Building for Linux](https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/install/Linux_Install_Guide.html). Refer to the specific component documentation for instructions on building the repository.
@@ -59,8 +76,9 @@ The Build time will reduce significantly if we limit the GPU Architecture/s agai
mkdir -p ~/WORKSPACE/ # Or any folder name other than WORKSPACE
cd ~/WORKSPACE/
export ROCM_VERSION=6.2.0 # or 6.1.1 6.1.2
git clone https://github.com/ROCm/ROCm -b amd/dgaliffi/submodules-${ROCM_VERSION} --recurse-submodules
export ROCM_VERSION=6.1.0 # or 6.1.1 6.1.2
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.1.x -m tools/rocm-build/rocm-${ROCM_VERSION}.xml
~/bin/repo sync
# --------------------------------------
# Step 2: Prepare build environment
@@ -137,6 +155,12 @@ Note: [Overview for ROCm.mk](tools/rocm-build/README.md)
## ROCm documentation
This repository contains the [manifest file](https://gerrit.googlesource.com/git-repo/+/HEAD/docs/manifest-format.md)
for ROCm releases, changelogs, and release information.
The `default.xml` file contains information for all repositories and the associated commit used to build
the current ROCm release; `default.xml` uses the [Manifest Format repository](https://gerrit.googlesource.com/git-repo/).
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
`develop` branch of our repositories contains content for the next ROCm release.

1969
RELEASE.md

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-6.2.0"
<default revision="refs/tags/rocm-6.1.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />
@@ -10,8 +10,7 @@
<project name="ROCR-Runtime" />
<project name="ROCT-Thunk-Interface" />
<project name="amdsmi" />
<project name="omniperf" />
<project name="omnitrace" />
<project name="clang-ocl" />
<project name="rdc" />
<project name="rocm_bandwidth_test" />
<project name="rocm_smi_lib" />
@@ -19,7 +18,6 @@
<project name="rocminfo" />
<project name="rocprofiler" />
<project name="rocprofiler-register" />
<project name="rocprofiler-sdk" />
<project name="roctracer" />
<!--HIP Projects-->
<project name="HIP" />
@@ -53,11 +51,9 @@
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipfort" />
<project groups="mathlibs" name="rccl" />
<project groups="mathlibs" name="rocAL" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="rocDecode" />
<project groups="mathlibs" name="rocPyDecode" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocRAND" />

View File

@@ -0,0 +1,482 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="OpenMP support in ROCm">
<meta name="keywords" content="OpenMP, LLVM, OpenMP toolchain">
</head>
# OpenMP support in ROCm
## Introduction
The ROCm™ installation includes an LLVM-based implementation that fully supports
the OpenMP 4.5 standard and a subset of OpenMP 5.0, 5.1, and 5.2 standards.
Fortran, C/C++ compilers, and corresponding runtime libraries are included.
Along with host APIs, the OpenMP compilers support offloading code and data onto
GPU devices. This document briefly describes the installation location of the
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
with OpenMP applications. The GPUs supported are the same as those supported by
this ROCm release. See the list of supported GPUs for {doc}`Linux<rocm-install-on-linux:reference/system-requirements>` and
{doc}`Windows<rocm-install-on-windows:reference/system-requirements>`.
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
The following image illustrates the internal steps taken to translate a users application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
![OpenMP toolchain](../../data/reference/openmp/openmp-toolchain.svg "OpenMP toolchain")
### Installation
The OpenMP toolchain is automatically installed as part of the standard ROCm
installation and is available under `/opt/rocm-{version}/llvm`. The
sub-directories are:
* bin: Compilers (`flang` and `clang`) and other binaries.
* examples: The usage section below shows how to compile and run these programs.
* include: Header files.
* lib: Libraries including those required for target offload.
* lib-debug: Debug versions of the above libraries.
## OpenMP: usage
The example programs can be compiled and run by pointing the environment
variable `ROCM_PATH` to the ROCm install directory.
**Example:**
```bash
export ROCM_PATH=/opt/rocm-{version}
cd $ROCM_PATH/share/openmp-extras/examples/openmp/veccopy
sudo make run
```
:::{note}
`sudo` is required since we are building inside the `/opt` directory.
Alternatively, copy the files to your home directory first.
:::
The above invocation of Make compiles and runs the program. Note the options
that are required for target offload from an OpenMP program:
```bash
-fopenmp --offload-arch=<gpu-arch>
```
:::{note}
The compiler also accepts the alternative offloading notation:
```bash
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
```
:::
Obtain the value of `gpu-arch` by running the following command:
```bash
% /opt/rocm-{version}/bin/rocminfo | grep gfx
```
[//]: # (dated link below, needs updating)
See the complete list of [compiler command-line references](https://github.com/ROCm/llvm-project/blob/amd-staging/openmp/docs/CommandLineArgumentReference.rst).
### Using `rocprof` with OpenMP
The following steps describe a typical workflow for using `rocprof` with OpenMP
code compiled with AOMP:
1. Run `rocprof` with the program command line:
```bash
% rocprof <application> <args>
```
This produces a `results.csv` file in the users current directory that
shows basic stats such as kernel names, grid size, number of registers used,
etc. The user can choose to specify the preferred output file name using the
o option.
2. Add options for a detailed result:
```bash
--stats: % rocprof --stats <application> <args>
```
The stats option produces timestamps for the kernels. Look into the output
CSV file for the field, `DurationNs`, which is useful in getting an
understanding of the critical kernels in the code.
Apart from `--stats`, the option `--timestamp` on produces a timestamp for
the kernels.
3. After learning about the required kernels, the user can take a detailed look
at each one of them. `rocprof` has support for hardware counters: a set of
basic and a set of derived ones. See the complete list of counters using
options --list-basic and --list-derived. `rocprof` accepts either a text or
an XML file as an input.
For more details on `rocprof`, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Using tracing options
**Prerequisite:** When using the `--sys-trace` option, compile the OpenMP
program with:
```bash
-Wl,-rpath,/opt/rocm-{version}/lib -lamdhip64
```
The following tracing options are widely used to generate useful information:
* **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
API execution traces and a flat profile in a CSV file.
* **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
Since this option results in loading ``libamdhip64.so``, follow the
prerequisite as mentioned above.
A CSV and a JSON file are produced by the above trace options. The CSV file
presents the data in a tabular format, and the JSON file can be visualized using
Google Chrome at chrome://tracing/ or [Perfetto](https://perfetto.dev/).
Navigate to Chrome or Perfetto and load the JSON file to see the timeline of the
HSA calls.
For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
### Environment variables
:::{table}
:widths: auto
| Environment Variable | Purpose |
| --------------------------- | ---------------------------- |
| `OMP_NUM_TEAMS` | To set the number of teams for kernel launch, which is otherwise chosen by the implementation by default. You can set this number (subject to implementation limits) for performance tuning. |
| `LIBOMPTARGET_KERNEL_TRACE` | To print useful statistics for device operations. Setting it to 1 and running the program emits the name of every kernel launched, the number of teams and threads used, and the corresponding register usage. Setting it to 2 additionally emits timing information for kernel launches and data transfer operations between the host and the device. |
| `LIBOMPTARGET_INFO` | To print informational messages from the device runtime as the program executes. Setting it to a value of 1 or higher, prints fine-grain information and setting it to -1 prints complete information. |
| `LIBOMPTARGET_DEBUG` | To get detailed debugging information about data transfer operations and kernel launch when using a debug version of the device library. Set this environment variable to 1 to get the detailed information from the library. |
| `GPU_MAX_HW_QUEUES` | To set the number of HSA queues in the OpenMP runtime. The HSA queues are created on demand up to the maximum value as supplied here. The queue creation starts with a single initialized queue to avoid unnecessary allocation of resources. The provided value is capped if it exceeds the recommended, device-specific value. |
| `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` | To set the threshold size up to which data transfers are initiated asynchronously. The default threshold size is 1*1024*1024 bytes (1MB). |
| `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
:::
## OpenMP: features
The OpenMP programming model is greatly enhanced with the following new features
implemented in the past releases.
(openmp_usm)=
### Asynchronous behavior in OpenMP target regions
* Controlling Asynchronous Behavior
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
* Multithreaded Offloading on the Same Device
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
* Parallel Memory Copy Invocations
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
### Unified shared memory
Unified Shared Memory (USM) provides a pointer-based approach to memory
management. To implement USM, fulfill the following system requirements along
with Xnack capability.
#### Prerequisites
* Linux Kernel versions above 5.14
* Latest KFD driver packaged in ROCm stack
* Xnack, as USM support can only be tested with applications compiled with Xnack
capability
#### Xnack capability
When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
enabled both at compile- and run-time. To enable Xnack support at compile-time,
use:
```bash
--offload-arch=gfx908:xnack+
```
Or use another functionally equivalent option Xnack-any:
```bash
--offload-arch=gfx908
```
To enable Xnack functionality at runtime on a per-application basis,
use environment variable:
```bash
HSA_XNACK=1
```
When Xnack support is not needed:
* Build the applications to maximize resource utilization using:
```bash
--offload-arch=gfx908:xnack-
```
* At runtime, set the `HSA_XNACK` environment variable to 0.
#### Unified shared memory pragma
This OpenMP pragma is available on MI200 through `xnack+` support.
```bash
omp requires unified_shared_memory
```
As stated in the OpenMP specifications, this pragma makes the map clause on
target constructs optional. By default, on MI200, all memory allocated on the
host is fine grain. Using the map clause on a target clause is allowed, which
transforms the access semantics of the associated memory to coarse grain.
```bash
A simple program demonstrating the use of this feature is:
$ cat parallel_for.cpp
#include <stdlib.h>
#include <stdio.h>
#define N 64
#pragma omp requires unified_shared_memory
int main() {
int n = N;
int *a = new int[n];
int *b = new int[n];
for(int i = 0; i < n; i++)
b[i] = i;
#pragma omp target parallel for map(to:b[:n])
for(int i = 0; i < n; i++)
a[i] = b[i];
for(int i = 0; i < n; i++)
if(a[i] != i)
printf("error at %d: expected %d, got %d\n", i, i+1, a[i]);
return 0;
}
$ clang++ -O2 -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:xnack+ parallel_for.cpp
$ HSA_XNACK=1 ./a.out
```
In the above code example, pointer “a” is not mapped in the target region, while
pointer “b” is. Both are valid pointers on the GPU device and passed by-value to
the kernel implementing the target region. This means the pointer values on the
host and the device are the same.
The difference between the memory pages pointed to by these two variables is
that the pages pointed by “a” are in fine-grain memory, while the pages pointed
to by “b” are in coarse-grain memory during and after the execution of the
target region. This is accomplished in the OpenMP runtime library with calls to
the ROCr runtime to set the pages pointed by “b” as coarse grain.
### OMPT target support
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as
described in the OpenMP specification document. These APIs allow first-party
tools to examine the profile and kernel traces that execute on a device. A tool
can register callbacks for data transfer and kernel dispatch entry points or use
APIs to start and stop tracing for device-related activities such as data
transfer and kernel dispatch timings and associated metadata. If device tracing
is enabled, trace records for device activities are collected during program
execution and returned to the tool using the APIs described in the
specification.
The following example demonstrates how a tool uses the supported OMPT target
APIs. The `README` in `/opt/rocm/llvm/examples/tools/ompt` outlines the steps to
be followed, and the provided example can be run as shown below:
```bash
cd $ROCM_PATH/share/openmp-extras/examples/tools/ompt/veccopy-ompt-target-tracing
sudo make run
```
The file `veccopy-ompt-target-tracing.c` simulates how a tool initiates device
activity tracing. The file `callbacks.h` shows the callbacks registered and
implemented by the tool.
### Floating point atomic operations
The MI200-series GPUs support the generation of hardware floating-point atomics
using the OpenMP atomic pragma. The support includes single- and
double-precision floating-point atomic operations. The programmer must ensure
that the memory subjected to the atomic operation is in coarse-grain memory by
mapping it explicitly with the help of map clauses when not implicitly mapped by
the compiler as per the [OpenMP
specifications](https://www.openmp.org/specifications/). This makes these
hardware floating-point atomic instructions “fast,” as they are faster than
using a default compare-and-swap loop scheme, but at the same time “unsafe,” as
they are not supported on fine-grain memory. The operation in
`unified_shared_memory` mode also requires programmers to map the memory
explicitly when not implicitly mapped by the compiler.
To request fast floating-point atomic instructions at the file level, use
compiler flag `-munsafe-fp-atomics` or a hint clause on a specific pragma:
```bash
double a = 0.0;
#pragma omp atomic hint(AMD_fast_fp_atomics)
a = a + 1.0;
```
:::{note}
`AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
`AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
:::
To disable the generation of fast floating-point atomic instructions at the file
level, build using the option `-msafe-fp-atomics` or use a hint clause on a
specific pragma:
```bash
double a = 0.0;
#pragma omp atomic hint(AMD_safe_fp_atomics)
a = a + 1.0;
```
The hint clause value always has a precedence over the compiler flag, which
allows programmers to create atomic constructs with a different behavior than
the rest of the file.
See the example below, where the user builds the program using
`-msafe-fp-atomics` to select a file-wide “safe atomic” compilation. However,
the fast atomics hint clause over variable “a” takes precedence and operates on
“a” using a fast/unsafe floating-point atomic, while the variable “b” in the
absence of a hint clause is operated upon using safe floating-point atomics as
per the compiler flag.
```bash
double a = 0.0;.
#pragma omp atomic hint(AMD_fast_fp_atomics)
a = a + 1.0;
double b = 0.0;
#pragma omp atomic
b = b + 1.0;
```
### AddressSanitizer tool
AddressSanitizer (ASan) is a memory error detector tool utilized by applications to
detect various errors ranging from spatial issues such as out-of-bound access to
temporal issues such as use-after-free. The AOMP compiler supports ASan for AMD
GPUs with applications written in both HIP and OpenMP.
**Features supported on host platform (Target x86_64):**
* Use-after-free
* Buffer overflows
* Heap buffer overflow
* Stack buffer overflow
* Global buffer overflow
* Use-after-return
* Use-after-scope
* Initialization order bugs
**Features supported on AMDGPU platform (`amdgcn-amd-amdhsa`):**
* Heap buffer overflow
* Global buffer overflow
**Software (kernel/OS) requirements:** Unified Shared Memory support with Xnack
capability. See the section on [Unified Shared Memory](#unified-shared-memory)
for prerequisites and details on Xnack.
**Example:**
* Heap buffer overflow
```bash
void main() {
....... // Some program statements
....... // Some program statements
#pragma omp target map(to : A[0:N], B[0:N]) map(from: C[0:N])
{
#pragma omp parallel for
for(int i =0 ; i < N; i++){
C[i+10] = A[i] + B[i];
} // end of for loop
}
....... // Some program statements
}// end of main
```
See the complete sample code for heap buffer overflow
[here](https://github.com/ROCm/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
* Global buffer overflow
```bash
#pragma omp declare target
int A[N],B[N],C[N];
#pragma omp end declare target
void main(){
...... // some program statements
...... // some program statements
#pragma omp target data map(to:A[0:N],B[0:N]) map(from: C[0:N])
{
#pragma omp target update to(A,B)
#pragma omp target parallel for
for(int i=0; i<N; i++){
C[i]=A[i*100]+B[i+22];
} // end of for loop
#pragma omp target update from(C)
}
........ // some program statements
} // end of main
```
See the complete sample code for global buffer overflow
[here](https://github.com/ROCm/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
### Clang compiler option for kernel optimization
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
* `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including no-loop and Cross-team reductions.
* `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
* `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
* `-O3` if no `-O*` is specified by the user.
### Specialized kernels
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
* No-loop
* Big-jump-loop
* Cross-team reductions
To enable the generation of specialized kernels, follow these guidelines:
* Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the no-loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the big-jump-loop kernel instead of the no-loop kernel.
* Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
* To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
#### No-loop kernel generation
The no-loop kernel generation feature optimizes the compiler performance by generating a specialized kernel for certain OpenMP target constructs such as target teams distribute parallel for. The specialized kernel generation feature assumes every thread executes a single iteration of the user loop, which leads the runtime to launch a total number of GPU threads equal to or greater than the iteration space size of the target region loop. This allows the compiler to generate code for the loop body without an enclosing loop, resulting in reduced control-flow complexity and potentially better performance.
#### Big-jump-loop kernel generation
A no-loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the big-jump-loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
#### Cross-team optimized reduction kernel generation
If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient cross-team communication. New APIs for cross-team reduction are implemented in the device runtime and are automatically generated by clang.

View File

@@ -25,69 +25,66 @@ additional licenses. Please review individual repositories for more information.
<!-- spellcheck-disable -->
| Component | License |
|:---------------------|:-------------------------|
| [HIP](https://github.com/ROCm/HIP/) | [MIT](https://github.com/ROCm/HIP/blob/develop/LICENSE.txt) |
| [HIPCC](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc) | [MIT](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/hipcc/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm/HIPIFY/) | [MIT](https://github.com/ROCm/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [AMDMIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/develop/LICENSE.txt) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/develop/LICENSE.txt) |
| [AMD Common Language Runtime (CLR)](https://github.com/ROCm/clr) | [MIT](https://github.com/ROCm/clr/blob/develop/LICENCE) |
| [AMD SMI](https://github.com/ROCm/amdsmi) | [MIT](https://github.com/ROCm/amdsmi/blob/develop/LICENSE) |
| [ROCm-Core](https://github.com/ROCm/rocm-core) | [MIT](https://github.com/ROCm/rocm-core/blob/master/copyright) |
| [hipamd](https://github.com/ROCm/clr/tree/develop/hipamd) | [MIT](https://github.com/ROCm/clr/blob/develop/hipamd/LICENSE.txt) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/clr/tree/develop/opencl) | [MIT](https://github.com/ROCm/clr/blob/develop/opencl/LICENSE.txt) |
| [Tensile](https://github.com/ROCm/Tensile/) | [MIT](https://github.com/ROCm/Tensile/blob/develop/LICENSE.md) |
| [aomp](https://github.com/ROCm/aomp/) | [Apache 2.0](https://github.com/ROCm/aomp/blob/aomp-dev/LICENSE) |
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/amd-staging/LICENSE.TXT) |
| [llvm-project/flang](https://github.com/ROCm/llvm-project/tree/amd-staging/flang) | [Apache 2.0](https://github.com/ROCm/llvm-project/blob/amd-staging/flang/LICENSE.TXT) |
| [Code Object Manager (Comgr)](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/LICENSE.TXT) |
| [clang-ocl](https://github.com/ROCm/clang-ocl/) | [MIT](https://github.com/ROCm/clang-ocl/blob/master/LICENSE) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCR-Runtime](https://github.com/ROCm/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCR Debug Agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/amd-staging/LICENSE.txt) |
| [Composable Kernel](https://github.com/ROCm/composable_kernel) | [MIT](https://github.com/ROCm/composable_kernel/blob/develop/LICENSE) |
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/rocm/LICENSE.txt) |
| [HIP](https://github.com/ROCm/HIP/) | [MIT](https://github.com/ROCm/HIP/blob/develop/LICENSE.txt) |
| [hipamd](https://github.com/ROCm/clr/tree/develop/hipamd) | [MIT](https://github.com/ROCm/clr/blob/develop/hipamd/LICENSE.txt) |
| [hipBLAS](https://github.com/ROCm/hipBLAS/) | [MIT](https://github.com/ROCm/hipBLAS/blob/develop/LICENSE.md) |
| [hipBLASLt](https://github.com/ROCm/hipBLASLt/) | [MIT](https://github.com/ROCm/hipBLASLt/blob/develop/LICENSE.md) |
| [HIPCC](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc) | [MIT](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/hipcc/LICENSE.txt) |
| [hipCUB](https://github.com/ROCm/hipCUB/) | [Custom](https://github.com/ROCm/hipCUB/blob/develop/LICENSE.txt) |
| [hipFFT](https://github.com/ROCm/hipFFT/) | [MIT](https://github.com/ROCm/hipFFT/blob/develop/LICENSE.md) |
| [hipfort](https://github.com/ROCm/hipfort/) | [MIT](https://github.com/ROCm/hipfort/blob/develop/LICENSE) |
| [HIPIFY](https://github.com/ROCm/HIPIFY/) | [MIT](https://github.com/ROCm/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [hipFORT](https://github.com/ROCm/hipfort/) | [MIT](https://github.com/ROCm/hipfort/blob/develop/LICENSE) |
| [hipRAND](https://github.com/ROCm/hipRAND/) | [MIT](https://github.com/ROCm/hipRAND/blob/develop/LICENSE.txt) |
| [hipSOLVER](https://github.com/ROCm/hipSOLVER/) | [MIT](https://github.com/ROCm/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCm/hipSPARSE/) | [MIT](https://github.com/ROCm/hipSPARSE/blob/develop/LICENSE.md) |
| [hipSPARSELt](https://github.com/ROCm/hipSPARSELt/) | [MIT](https://github.com/ROCm/hipSPARSELt/blob/develop/LICENSE.md) |
| [hipTensor](https://github.com/ROCm/hipTensor) | [MIT](https://github.com/ROCm/hipTensor/blob/develop/LICENSE) |
| hsa-amd-aqlprofile | [AMD Software EULA](https://www.amd.com/en/legal/eula/amd-software-eula.html) |
| [llvm-project](https://github.com/ROCm/llvm-project/) | [Apache](https://github.com/ROCm/llvm-project/blob/amd-staging/LICENSE.TXT) |
| [llvm-project/flang](https://github.com/ROCm/llvm-project/tree/amd-staging/flang) | [Apache 2.0](https://github.com/ROCm/llvm-project/blob/amd-staging/flang/LICENSE.TXT) |
| [MIGraphX](https://github.com/ROCm/AMDMIGraphX/) | [MIT](https://github.com/ROCm/AMDMIGraphX/blob/develop/LICENSE) |
| [MIOpen](https://github.com/ROCm/MIOpen/) | [MIT](https://github.com/ROCm/MIOpen/blob/develop/LICENSE.txt) |
| [MIVisionX](https://github.com/ROCm/MIVisionX/) | [MIT](https://github.com/ROCm/MIVisionX/blob/develop/LICENSE.txt) |
| [Omniperf](https://github.com/ROCm/omniperf) | [MIT](https://github.com/ROCm/omniperf/blob/main/LICENSE) |
| [Omnitrace](https://github.com/ROCm/omnitrace) | [MIT](https://github.com/ROCm/omnitrace/blob/main/LICENSE) |
| [rocAL](https://github.com/ROCm/rocAL) | [MIT](https://github.com/ROCm/rocAL/blob/develop/LICENSE.txt) |
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
| [rocBLAS](https://github.com/ROCm/rocBLAS/) | [MIT](https://github.com/ROCm/rocBLAS/blob/develop/LICENSE.md) |
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-staging/LICENSE.txt) |
| [rocDecode](https://github.com/ROCm/rocDecode) | [MIT](https://github.com/ROCm/rocDecode/blob/develop/LICENSE) |
| [rocFFT](https://github.com/ROCm/rocFFT/) | [MIT](https://github.com/ROCm/rocFFT/blob/develop/LICENSE.md) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/amd-staging/License.txt) |
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [ROCm CMake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
| [ROCm Communication Collectives Library (RCCL)](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [ROCm-Core](https://github.com/ROCm/rocm-core) | [MIT](https://github.com/ROCm/rocm-core/blob/master/copyright) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/develop/LICENSE) |
| [ROCm-Device-Libs](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/LICENSE.TXT) |
| [ROCm-OpenCL-Runtime](https://github.com/ROCm/clr/tree/develop/opencl) | [MIT](https://github.com/ROCm/clr/blob/develop/opencl/LICENSE.txt) |
| [ROCm Performance Primitives (RPP)](https://github.com/ROCm/rpp) | [MIT](https://github.com/ROCm/rpp/blob/develop/LICENSE) |
| [ROCm SMI Lib](https://github.com/ROCm/rocm_smi_lib/) | [MIT](https://github.com/ROCm/rocm_smi_lib/blob/develop/License.txt) |
| [ROCm Validation Suite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| [rocPRIM](https://github.com/ROCm/rocPRIM/) | [MIT](https://github.com/ROCm/rocPRIM/blob/develop/LICENSE.txt) |
| [ROCProfiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-master/LICENSE) |
| [ROCprofiler-SDK](https://github.com/ROCm/rocprofiler-sdk) | [MIT](https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/LICENSE) |
| [rocPyDecode](https://github.com/ROCm/rocPyDecode) | [MIT](https://github.com/ROCm/rocPyDecode/blob/develop/LICENSE) |
| [ROCm Performance Primitives (RPP)](https://github.com/ROCm/rpp) | [MIT](https://github.com/ROCm/rpp/blob/develop/LICENSE) |
| [rocRAND](https://github.com/ROCm/rocRAND/) | [MIT](https://github.com/ROCm/rocRAND/blob/develop/LICENSE.txt) |
| [ROCr Debug Agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/amd-staging/LICENSE.txt) |
| [ROCR-Runtime](https://github.com/ROCm/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/ROCR-Runtime/blob/master/LICENSE.txt) |
| [rocSOLVER](https://github.com/ROCm/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCm/rocSOLVER/blob/develop/LICENSE.md) |
| [rocSPARSE](https://github.com/ROCm/rocSPARSE/) | [MIT](https://github.com/ROCm/rocSPARSE/blob/develop/LICENSE.md) |
| [rocThrust](https://github.com/ROCm/rocThrust/) | [Apache 2.0](https://github.com/ROCm/rocThrust/blob/develop/LICENSE) |
| [ROCTracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| [ROCT-Thunk-Interface](https://github.com/ROCm/ROCT-Thunk-Interface/) | [MIT](https://github.com/ROCm/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [rocWMMA](https://github.com/ROCm/rocWMMA/) | [MIT](https://github.com/ROCm/rocWMMA/blob/develop/LICENSE.md) |
| [Tensile](https://github.com/ROCm/Tensile/) | [MIT](https://github.com/ROCm/Tensile/blob/develop/LICENSE.md) |
| [ROCm Communication Collectives Library (RCCL)](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rdc/) | [MIT](https://github.com/ROCm/rdc/blob/develop/LICENSE) |
| [ROCm CMake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-staging/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm/ROCgdb/blob/amd-master/COPYING) |
| [ROCm SMI Lib](https://github.com/ROCm/rocm_smi_lib/) | [MIT](https://github.com/ROCm/rocm_smi_lib/blob/develop/License.txt) |
| [AMD SMI](https://github.com/ROCm/amdsmi) | [MIT](https://github.com/ROCm/amdsmi/blob/develop/LICENSE) |
| [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/amd-staging/License.txt) |
| [ROCProfiler](https://github.com/ROCm/rocprofiler/) | [MIT](https://github.com/ROCm/rocprofiler/blob/amd-master/LICENSE) |
| [ROCTracer](https://github.com/ROCm/roctracer/) | [MIT](https://github.com/ROCm/roctracer/blob/amd-master/LICENSE) |
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [TransferBench](https://github.com/ROCm/TransferBench) | [MIT](https://github.com/ROCm/TransferBench/blob/develop/LICENSE.md) |
| [ROCmValidationSuite](https://github.com/ROCm/ROCmValidationSuite/) | [MIT](https://github.com/ROCm/ROCmValidationSuite/blob/master/LICENSE) |
| hsa-amd-aqlprofile | [AMD Software EULA](https://www.amd.com/en/legal/eula/amd-software-eula.html)
Open sourced ROCm components are released via public GitHub
repositories, packages on [https://repo.radeon.com](https://repo.radeon.com) and other distribution channels.

View File

@@ -8,169 +8,121 @@ Compatibility matrix
Use this matrix to view the ROCm compatibility across successive major and minor releases.
You can also refer to the :ref:`past versions of ROCm compatibility matrix<past-rocm-compatibility-matrix>`.
.. container:: format-big-table
.. csv-table::
:header: "ROCm Version", "6.2.0", "6.1.2", "6.0.0"
:header: "ROCm Version", "6.1.5", "6.1.2", "6.0.0"
:stub-columns: 1
:doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>`, "Ubuntu 24.04","",""
,"Ubuntu 22.04.5 [#Ubuntu220405]_, 22.04.4","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3"
,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.4, 9.3","RHEL 9.4 [#red-hat94]_, 9.3, 9.2","RHEL 9.3, 9.2"
,"RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
:doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>`,"Ubuntu 22.04.5 [#Ubuntu220405]_, 22.04.4, 22.04.3","Ubuntu 22.04.5 [#Ubuntu220405]_, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3"
,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.4 [#red-hat94]_, 9.3, 9.2","RHEL 9.4 [#red-hat94]_, 9.3, 9.2","RHEL 9.3, 9.2"
,"RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,CentOS 7.9,CentOS 7.9
,"Oracle Linux 8.9 [#oracle89]_","Oracle Linux 8.9 [#oracle89]_",""
,".. _architecture-support-compatibility-matrix:",,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3
,Oracle Linux 8.9 [#oracle89]_,Oracle Linux 8.9 [#oracle89]_,
,.. _architecture-support-compatibility-matrix,,
:doc:`GFX Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA
,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2
,".. _gpu-support-compatibility-matrix:",,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1100,gfx1100,gfx1100
,.. _gpu-support-compatibility-matrix,,
:doc:`GFX Card <rocm-install-on-linux:reference/system-requirements>`,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030
,gfx942 [#mi300_620]_, gfx942 [#mi300_612]_, gfx942 [#mi300_600]_
, gfx942 [#mi300_612]_, gfx942 [#mi300_612]_, gfx942 [#mi300_600]_
,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908
,,,
FRAMEWORK SUPPORT,".. _framework-support-compatibility-matrix:",,
:doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>`,"2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.16.1, 2.15.1, 2.14.1","2.15, 2.14, 2.13","2.14, 2.13, 2.12"
ECOSYSTEM SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>`,"2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.14.1
,,,
THIRD PARTY COMMS,".. _thirdpartycomms-support-compatibility-matrix:",,
`UCC <https://github.com/ROCm/ucc>`_,>=1.2.0,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.14.1,>=1.14.1
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.14.1,>=1.14.1,>=1.14.1
,,,
THIRD PARTY ALGORITHM,".. _thirdpartyalgorithm-support-compatibility-matrix:",,
Thrust,2.2.0,2.1.0,2.0.1
CUB,2.2.0,2.1.0,2.0.1
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix:,,
Thrust,2.1.0,2.1.0,2.0.1
CUB,2.1.0,2.1.0,2.0.1
,,,
ML & COMPUTER VISION,".. _mllibs-support-compatibility-matrix:",,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.10.0,2.9.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.2.0,3.1.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,3.0.0,2.5.0,2.5.0
:doc:`MIGraphX <amdmigraphx:index>`,2.9.0,2.9.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.1.0,3.1.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,2.5.0,2.5.0,2.5.0
:doc:`rocDecode <rocdecode:index>`,0.6.0,0.6.0,N/A
:doc:`RPP <rpp:index>`,1.8.0,1.5.0,1.4.0
:doc:`rocPyDecode <rocpydecode:index>`,0.1.0,N/A,N/A
:doc:`RPP <rpp:index>`,1.5.0,1.5.0,1.4.0
,,,
COMMUNICATION,".. _commlibs-support-compatibility-matrix:",,
:doc:`RCCL <rccl:index>`,2.20.5,2.18.6,2.18.3
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
:doc:`RCCL <rccl:index>`,2.18.6,2.18.6,2.18.3
,,,
MATH LIBS,".. _mathlibs-support-compatibility-matrix:",,
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.2.0,2.1.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.8.0,0.7.0,0.6.0
:doc:`hipBLAS <hipblas:index>`,2.1.0,2.1.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.7.0,0.7.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.14,1.0.14,1.0.13
:doc:`hipFORT <hipfort:index>`,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.11.0,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.2.0,2.1.1,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.1.1,3.0.1,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.1,0.2.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.2.0,3.1.1,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.2.0,4.1.2,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.28,1.0.27,1.0.23
:doc:`rocRAND <rocrand:index>`,3.1.0,3.0.1,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.26.0,3.25.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.2.0,3.1.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.5.0,1.4.0,1.3.0
:doc:`hipRAND <hiprand:index>`,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.1.1,2.1.1,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.0.1,3.0.1,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.0,0.2.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.1.1,3.1.1,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.1.2,4.1.2,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.27,1.0.27,1.0.23
:doc:`rocRAND <rocrand:index>`,3.0.1,3.0.1,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.25.0,3.25.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.1.2,3.1.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.4.0,1.4.0,1.3.0
`Tensile <https://github.com/ROCm/Tensile>`_,4.40.0,4.40.0,4.39.0
,,,
PRIMITIVES,".. _primitivelibs-support-compatibility-matrix:",,
:doc:`hipCUB <hipcub:index>`,3.2.0,3.1.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.3.0,1.2.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.2.0,3.1.0,3.0.0
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
:doc:`hipCUB <hipcub:index>`,3.1.0,3.1.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.2.0,1.2.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.1.0,3.1.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.0.1,3.0.1,3.0.0
,,,
SUPPORT LIBS,,,
`hipother <https://github.com/ROCm/hipother>`_,6.2.41133,6.1.40093,6.1.32830
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.2.0,6.1.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,20240607.1.4246,20240125.5.08,20231016.2.245
`hipother <https://github.com/ROCm/hipother>`_,6.1.40093,6.1.40093,6.1.32830
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.12.0,0.12.0,0.11.0
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.1.5,6.1.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,20240125.5.08,20240125.5.08,20231016.2.245
,,,
SYSTEM MGMT TOOLS,".. _tools-support-compatibility-matrix:",,
:doc:`AMD SMI <amdsmi:index>`,24.6.2,24.5.1,23.4.2
:doc:`ROCm Data Center Tool <rdc:index>`,1.0.0,0.3.0,0.3.0
TOOLS,.. _tools-support-compatibility-matrix:,,
:doc:`AMD SMI <amdsmi:index>`,24.5.1,24.5.1,23.4.2
:doc:`HIPIFY <hipify:index>`,17.0.0.24193,17.0.0.24193,17.0.0.23483
:doc:`ROCdbgapi <rocdbgapi:index>`,0.71.0,0.71.0,0.71.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.3.0,7.2.0,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.2.0,rocm-6.1.2,rocm-6.0.0
,,,
PERFORMANCE TOOLS,,,
:doc:`Omniperf <omniperf:index>`,2.0.1,N/A,N/A
:doc:`Omnitrace <omnitrace:index>`,1.11.2,N/A,N/A
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60105,2.0.60102,2.0.60000
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.3.0,0.3.0,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60105,4.1.60102,4.1.60000
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60200,2.0.60102,2.0.60000
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.4.0,N/A,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60200,4.1.60102,4.1.60000
,,,
DEVELOPMENT TOOLS,,,
:doc:`HIPIFY <hipify:index>`,18.0.0.24232,17.0.0.24193,17.0.0.23483
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.13.0,0.12.0,0.11.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.76.0,0.71.0,0.71.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,14.2.0,14.1.0,13.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.3.0,N/A
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,14.1.0,14.1.0,13.2.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.2.0,7.2.0,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.1.5,rocm-6.1.2,rocm-6.0.0
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.3,2.0.3,2.0.3
,,,
COMPILERS,".. _compilers-support-compatibility-matrix:",,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,0.5.0,0.5.0
`Flang <https://github.com/ROCm/flang>`_,18.0.0.24232,17.0.0.24193,17.0.0.23483
`llvm-project <https://github.com/ROCm/llvm-project>`_,18.0.0.24232,17.0.0.24193,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,18.0.0.24232,17.0.0.24193,17.0.0.23483
COMPILERS,.. _compilers-support-compatibility-matrix:,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,0.5.0,0.5.0,0.5.0
:doc:`hipCC <hipcc:index>`,1.0.0,1.0.0,1.0.0
`Flang <https://github.com/ROCm/flang>`_,17.0.0.24193,17.0.0.24193,17.0.0.23483
:doc:`llvm-project <llvm-project:index>`,17.0.0.24193,17.0.0.24193,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,17.0.0.24193,17.0.0.24193,17.0.0.23483
,,,
RUNTIMES,".. _runtime-support-compatibility-matrix:",,
:doc:`HIP <hip:index>`,6.2.41133,6.1.40093,6.1.32830
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.1.40093,6.1.40093,6.1.32830
:doc:`HIP <hip:index>`,6.1.40093,6.1.40093,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
:doc:`ROCR-Runtime <rocr-runtime:index>`,1.13.0,1.13.0,1.12.0
.. rubric:: Footnotes
.. [#Ubuntu220405] Preview support of Ubuntu 22.04.5 only
.. [#red-hat94] RHEL 9.4 is supported only on AMD Instinct MI300A.
.. [#oracle89] Oracle Linux is supported only on AMD Instinct MI300X.
.. [#mi300_620] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_612] **For ROCm 6.1.2** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4 and Oracle Linux.
.. [#mi300_600] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
..
Footnotes and ref anchors in below historical tables should be appended with "-past-60", to differentiate from the
footnote references in the above, latest, compatibility matrix. It also allows to easily find & replace.
An easy way to work is to download the historical.CSV file, and update open it in excel. Then when content is ready,
delete the columns you don't need, to build the current compatibility matrix to use in above table. Find & replace all
instances of "-past-60" to make it ready for above table.
.. _past-rocm-compatibility-matrix:
Past versions of ROCm compatibility matrix
***************************************************
Expand for full historical view of:
.. dropdown:: ROCm 6.0 - Present
You can `download the entire .csv <../downloads/compatibility-matrix-historical-6.0.csv>`_ for offline reference.
.. csv-table::
:file: ../data/reference/compatibility-matrix-historical-6.0.csv
:widths: 20,10,10,10,10,10,10
:header-rows: 1
:stub-columns: 1
.. rubric:: Footnotes
.. [#Ubuntu220405-past-60] Preview support of Ubuntu 22.04.5 only
.. [#red-hat94-past-60] RHEL 9.4 is supported only on AMD Instinct MI300A.
.. [#oracle89-past-60] Oracle Linux is supported only on AMD Instinct MI300X.
.. [#mi300_620-past-60] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_612-past-60] **For ROCm 6.1.2** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4 and Oracle Linux.
.. [#mi300_611-past-60] **For ROCm 6.1.1** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4 and Oracle Linux.
.. [#mi300_610-past-60] **For ROCm 6.1.0** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_602-past-60] **For ROCm 6.0.2** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#Ubuntu220405] Preview support of Ubuntu 22.04.5 only.
.. [#red-hat94] **For ROCm 6.1** - RHEL 9.4 is supported only on AMD Instinct MI300A.
.. [#oracle89] **For ROCm 6.1.1** - Oracle Linux is supported only on AMD Instinct MI300X.
.. [#mi300_612] **For ROCm 6.1** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_600] **For ROCm 6.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9 and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.

View File

@@ -65,7 +65,7 @@ This example is adapted from the PyTorch research hub page on [Inception V3](htt
Follow these steps:
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
@@ -155,7 +155,7 @@ The previous section focused on downloading and using the Inception V3 model for
Follow these steps:
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker pull rocm/pytorch:latest

View File

@@ -0,0 +1,21 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="ROCm compilers disambiguation">
<meta name="keywords" content="compilers, compiler naming, AMD, ROCm">
</head>
# ROCm compilers disambiguation
ROCm ships multiple compilers of varying origins and purposes. This article
disambiguates compiler naming used throughout the documentation.
## Compiler terms
| Term | Description |
| - | - |
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/ROCm/llvm-project" target="_blank">https://github.com/ROCm/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm/HIPIFY" target="_blank">https://github.com/ROCm/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm/HIPCC" target="_blank">https://github.com/ROCm/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -9,6 +9,6 @@
The following topics describe using specific features of the compilation tools:
* [ROCm compiler infrastructure](https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html)
* [Using AddressSanitizer](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/using-gpu-sanitizer.html)
* [OpenMP support](https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/openmp.html)
* [Using AddressSanitizer](./using-gpu-sanitizer.md)
* [Compiler disambiguation](./compiler-disambiguation.md)
* [OpenMP support in ROCm](../about/compatibility/openmp.md)

View File

@@ -33,8 +33,8 @@ Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
subdivided into four SIMD units that process SIMD instructions of 16 data
elements per instruction (for the FP64 data type). This enables the CU to
process 64 work items (a so-called “wavefront”) at a peak clock frequency of 1.7
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 45.3
TFLOPS for vector instructions. The MI250 compute units also provide specialized
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 22.6
TFLOPS for vector instructions. This equates to 45.3 TFLOPS for vector instructions for both GCDs together. The MI250 compute units also provide specialized
execution units (also called matrix cores), which are geared toward executing
matrix operations like matrix-matrix multiplications. For FP64, the peak
performance of these units amounts to 90.5 TFLOPS.

View File

@@ -0,0 +1,431 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Using the LLVM ASan on a GPU">
<meta name="keywords" content="LLVM, ASan, address sanitizer, AddressSanitizer, instrumented
libraries, instrumented applications, AMD, ROCm">
</head>
# Using the AddressSanitizer on a GPU (beta release)
The LLVM AddressSanitizer (ASan) provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications exactly like pure CPU applications. However, this simplicity has not been achieved yet.
This document provides documentation on using ROCm ASan.
For information about LLVM ASan, see the [LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
:::{note}
The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
:::
## Compiling for ASan
The ASan process begins by compiling the application of interest with the ASan instrumentation.
Recommendations for doing this are:
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
* Add the following options to the existing compiler and linker options:
* `-fsanitize=address` - enables instrumentation
* `-shared-libsan` - use shared version of runtime
* `-g` - add debug info for improved reporting
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
:::{tip}
It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
:::
:::{note}
When compiling OpenMP programs with ASan instrumentation, it is currently necessary to set the environment variable `LIBRARY_PATH` to `/opt/rocm-<version>/lib/llvm/lib/asan:/opt/rocm-<version>/lib/asan`. At runtime, it may be necessary to add `/opt/rocm-<version>/lib/llvm/lib/asan` to `LD_LIBRARY_PATH`.
:::
### About compilation time
When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.
There are a few options if the compile time becomes unacceptable:
* Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the ASan process.
* Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
* Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
## Installing ROCm GPU ASan packages
For a complete ROCm GPU Sanitizer installation, including packages, instrumented HSA and HIP runtimes, tools, and math libraries, use the following instruction,
```bash
sudo apt-get install rocm-ml-sdk-asan
```
## Using AMD-supplied ASan instrumented libraries
ROCm releases have optional packages that contain additional ASan instrumented builds of the ROCm libraries (usually found in `/opt/rocm-<version>/lib`). The instrumented libraries have identical names to the regular uninstrumented libraries, and are located in `/opt/rocm-<version>/lib/asan`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with `g++`. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.
## Running ASan instrumented applications
### Preparing to run an instrumented application
Here are a few recommendations to consider before running an ASan instrumented heterogeneous application.
* Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
* Ensure XNACK is enabled
* For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
* For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
* Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
* Ensure that the application depends on the ASan runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
If it does not appear, when executed the application will quickly output an ASan error that looks like:
```bash
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
* Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
There is an environment variable, `ASAN_OPTIONS`, that can be used to adjust the runtime behavior of the ASan runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASan runtime. The device runtime only currently supports the default settings for the few relevant options.
There are three `ASAN_OPTION` flags of note.
* `halt_on_error=0/1 default 1`.
This tells the ASan runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
* `detect_leaks=0/1 default 1`.
This option directs the ASan runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSan). For heterogeneous applications, this default results in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSan suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
* `quarantine_size_mb=N default 256`
This option defines the number of megabytes (MB) `N` of memory that the ASan runtime will hold after it is `freed` to detect use-after-free situations. This memory is unavailable for other purposes. The default of 256 MB may be too small to detect some use-after-free situations, especially given that the large size of many GPU memory allocations may push `freed` allocations out of quarantine before the attempted use.
:::{note}
Setting the value of `quarantine_size_mb` larger may enable more problematic uses to be detected, but at the cost of reducing memory available for other purposes.
:::
## Runtime overhead
Running an ASan instrumented application incurs
overheads which may result in unacceptably long runtimes
or failure to run at all.
### Higher execution time
ASan detection works by checking each address at runtime
before the address is actually accessed by a load, store, or atomic
instruction.
This checking involves an additional load to "shadow" memory which
records whether the address is "poisoned" or not, and additional logic
that decides whether to produce an detection report or not.
This extra runtime work can cause the application to slow down by
a factor of three or more, depending on how many memory accesses are
executed.
For heterogeneous applications, the shadow memory must be accessible by all devices
and this can mean that shadow accesses from some devices may be more costly
than non-shadow accesses.
### Higher memory use
The address checking described above relies on the compiler to surround
each program variable with a red zone and on ASan
runtime to surround each runtime memory allocation with a red zone and
fill the shadow corresponding to each red zone with poison.
The added memory for the red zones is additional overhead on top
of the 13% overhead for the shadow memory itself.
Applications which consume most one or more available memory pools when
run normally are likely to encounter allocation failures when run with
instrumentation.
## Runtime reporting
It is not the intention of this document to provide a detailed explanation of all the types of reports that can be output by the ASan runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.
An invalid address detection report for the CPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on address <memory address> at pc <pc> bp <bp> sp <sp> <access> of size <N> at <memory address> thread T0
```
and continues with a stack trace for the access, a stack trace for the allocation and deallocation, if relevant, and a dump of the shadow near the <memory address>.
In contrast, an invalid address detection report for the GPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on amdgpu device <device> at pc <pc> <access> of size <n> in workgroup id (<X>,<Y>,<Z>)
```
Above, `<device>` is the integer device ID, and `(<X>, <Y>, <Z>)` is the ID of the workgroup or block where the invalid address was detected.
While the CPU report include a call stack for the thread attempting the invalid access, the GPU is currently to a call stack of size one, i.e. the (symbolized) of the invalid access, e.g.
```bash
#0 <pc> in <fuction signature> at /path/to/file.hip:<line>:<column>
```
This short call stack is followed by a GPU unique section that looks like
```bash
Thread ids and accessed addresses:
<lid0> <maddr 0> : <lid1> <maddr1> : ...
```
where each `<lid j> <maddr j>` indicates the lane ID and the invalid memory address held by lane `j` of the wavefront attempting the invalid access.
Additionally, reports for invalid GPU accesses to memory allocated by GPU code via `malloc` or new starting with, for example,
```bash
==1234==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fa9f5c92dcc
```
or
```bash
==5678==ERROR: AddressSanitizer: heap-use-after-free on amdgpu device 3 at pc 0x7f4c10062d74
```
currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.
## Running ASan with `rocgdb`
`rocgdb` can be used to further investigate ASan detected errors, with some preparation.
Currently, the ASan runtime complains when starting `rocgdb` without preparation.
```bash
$ rocgdb my_app
==1122==ASan` runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
This is solved by setting environment variable `LD_PRELOAD` to the path to the ASan runtime, whose path can be obtained using the command
```bash
amdclang++ -print-file-name=libclang_rt.asan-x86_64.so
```
You should also set the environment variable `HIP_ENABLE_DEFERRED_LOADING=0` before debugging HIP applications.
After starting `rocgdb` breakpoints can be set on the ASan runtime error reporting entry points of interest. For example, if an ASan error report includes
```bash
WRITE of size 4 in workgroup id (10,0,0)
```
the `rocgdb` command needed to stop the program before the report is printed is
```bash
(gdb) break __asan_report_store4
```
Similarly, the appropriate command for a report including
```bash
READ of size <N> in workgroup ID (1,2,3)
```
is
```bash
(gdb) break __asan_report_load<N>
```
It is possible to set breakpoints on all ASan report functions using these commands:
```bash
$ rocgdb <path to application>
(gdb) start <commmand line arguments>
(gdb) rbreak ^__asan_report
(gdb) c
```
## Using ASan with a short HIP application
Consider the following simple and short demo of using the Address Sanitizer with a HIP application:
```C++
#include <cstdlib>
#include <hip/hip_runtime.h>
__global__ void
set1(int *p)
{
int i = blockDim.x*blockIdx.x + threadIdx.x;
p[i] = 1;
}
int
main(int argc, char **argv)
{
int m = std::atoi(argv[1]);
int n1 = std::atoi(argv[2]);
int n2 = std::atoi(argv[3]);
int c = std::atoi(argv[4]);
int *dp;
hipMalloc(&dp, m*sizeof(int));
hipLaunchKernelGGL(set1, dim3(n1), dim3(n2), 0, 0, dp);
int *hp = (int*)malloc(c * sizeof(int));
hipMemcpy(hp, dp, m*sizeof(int), hipMemcpyDeviceToHost);
hipDeviceSynchronize();
hipFree(dp);
free(hp);
std::puts("Done.");
return 0;
}
```
This application will attempt to access invalid addresses for certain command line arguments. In particular, if `m < n1 * n2` some device threads will attempt to access
unallocated device memory.
Or, if `c < m`, the `hipMemcpy` function will copy past the end of the `malloc` allocated memory.
**Note**: The `hipcc` compiler is used here for simplicity.
Compiling without XNACK results in a warning.
```bash
$ hipcc -g --offload-arch=gfx90a:xnack- -fsanitize=address -shared-libsan mini.hip -o mini
clang++: warning: ignoring` `-fsanitize=address' option for offload arch 'gfx90a:xnack-`, as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]`.
```
The binary compiled above will run, but the GPU code will not be instrumented and the `m < n1 * n2` error will not be detected. Switching to `--offload-arch=gfx90a:xnack+` in the command above results in a warning-free compilation and an instrumented application. After setting `PATH`, `LD_LIBRARY_PATH` and `HSA_XNACK` as described earlier, a check of the binary with `ldd` yields the following,
```bash
$ ldd mini
linux-vdso.so.1 (0x00007ffd1a5ae000)
libclang_rt.asan-x86_64.so => /opt/rocm-6.1.0-99999/llvm/lib/clang/17.0.0/lib/linux/libclang_rt.asan-x86_64.so (0x00007fb9c14b6000)
libamdhip64.so.5 => /opt/rocm-6.1.0-99999/lib/asan/libamdhip64.so.5 (0x00007fb9bedd3000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb9beba8000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb9bea59000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb9bea3e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb9be84a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb9be844000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb9be821000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb9be817000)
libamd_comgr.so.2 => /opt/rocm-6.1.0-99999/lib/asan/libamd_comgr.so.2 (0x00007fb9b4382000)
libhsa-runtime64.so.1 => /opt/rocm-6.1.0-99999/lib/asan/libhsa-runtime64.so.1 (0x00007fb9b3b00000)
libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fb9b3af3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb9c2027000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fb9b3ad7000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fb9b3aa7000)
libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007fb9b3a89000)
libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007fb9b3a70000)
libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007fb9b3a62000)
```
This confirms that the address sanitizer runtime is linked in, and the ASan instrumented version of the runtime libraries are used.
Checking the `PATH` yields
```bash
$ which llvm-symbolizer
/opt/rocm-6.1.0-99999/llvm/bin/llvm-symbolizer
```
Lastly, a check of the OS kernel version yields
```bash
$ uname -rv
5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023
```
which indicates that the required HMM support (kernel version > 5.6) is available. This completes the necessary setup. Running with `m = 100`, `n1 = 11`, `n2 = 10` and `c = 100` should produce
a report for an invalid access by the last 10 threads.
```bash
=================================================================
==3141==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fb1410d2cc4
WRITE of size 4 in workgroup id (10,0,0)
#0 0x7fb1410d2cc4 in set1(int*) at /home/dave/mini/mini.cpp:0:10
Thread ids and accessed addresses:
00 : 0x7fb14371d190 01 : 0x7fb14371d194 02 : 0x7fb14371d198 03 : 0x7fb14371d19c 04 : 0x7fb14371d1a0 05 : 0x7fb14371d1a4 06 : 0x7fb14371d1a8 07 : 0x7fb14371d1ac
08 : 0x7fb14371d1b0 09 : 0x7fb14371d1b4
0x7fb14371d190 is located 0 bytes after 400-byte region [0x7fb14371d000,0x7fb14371d190)
allocated by thread T0 here:
#0 0x7fb151c76828 in hsa_amd_memory_pool_allocate /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:692:3
#1 ...
#12 0x7fb14fb99ec4 in hipMalloc /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:568:3
#13 0x226630 in hipError_t hipMalloc<int>(int**, unsigned long) /opt/rocm-6.1.0-99999/include/hip/hip_runtime_api.h:8367:12
#14 0x226630 in main /home/dave/mini/mini.cpp:19:5
#15 0x7fb14ef02082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
Shadow bytes around the buggy address:
0x7fb14371cf00: ...
=>0x7fb14371d180: 00 00[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7fb14371d200: ...
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
...
==3141==ABORTING
```
Running with `m = 100`, `n1 = 10`, `n2 = 10` and `c = 99` should produce a report for an invalid copy.
```shell
=================================================================
==2817==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x514000150dcc at pc 0x7f5509551aca bp 0x7ffc90a7ae50 sp 0x7ffc90a7a610
WRITE of size 400 at 0x514000150dcc thread T0
#0 0x7f5509551ac9 in __asan_memcpy /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:61:3
#1 ...
#9 0x7f5507462a28 in hipMemcpy_common(void*, void const*, unsigned long, hipMemcpyKind, ihipStream_t*) /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:637:10
#10 0x7f5507464205 in hipMemcpy /work/dave/git/compute/external/clr/hipamd/src/hip_memory.cpp:642:3
#11 0x226844 in main /home/dave/mini/mini.cpp:22:5
#12 0x7f55067c3082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
#13 0x22605d in _start (/home/dave/mini/mini+0x22605d)
0x514000150dcc is located 0 bytes after 396-byte region [0x514000150c40,0x514000150dcc)
allocated by thread T0 here:
#0 0x7f5509553dcf in malloc /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
#1 0x226817 in main /home/dave/mini/mini.cpp:21:21
#2 0x7f55067c3082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: heap-buffer-overflow /work/dave/git/compute/external/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:61:3 in __asan_memcpy
Shadow bytes around the buggy address:
0x514000150b00: ...
=>0x514000150d80: 00 00 00 00 00 00 00 00 00[04]fa fa fa fa fa fa
0x514000150e00: ...
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
...
==2817==ABORTING
```
## Known issues with using GPU sanitizer
* Red zones must have limited size. It is possible for an invalid access to completely miss a red zone and not be detected.
* Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
* It can also be the case that a memory fault is reported for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
* There is currently a bug which can result in memory faults being reported when running instrumented device code which makes use of `malloc`, `free`, `new`, or `delete`.
* There is currently a bug which can result in undefined symbols being reported at compile time when instrumented device code makes use of `new` and `delete`.

View File

@@ -4,14 +4,11 @@
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import os
import shutil
# Keep capitalization due to similar linking on GitHub's markdown preview.
shutil.copy2("../RELEASE.md", "./about/release-notes.md")
os.system("mkdir -p ../_readthedocs/html/downloads")
os.system("cp data/reference/compatibility-matrix-historical-6.0.csv ../_readthedocs/html/downloads/compatibility-matrix-historical-6.0.csv")
latex_engine = "xelatex"
latex_elements = {
"fontpkg": r"""
@@ -21,25 +18,20 @@ latex_elements = {
"""
}
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
html_context = {}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = "6.2.0"
release = "6.2.0"
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = "6.1.5"
release = "6.1.5"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "os": ["linux", "windows"], "date": "2024-08-02"},
{"file": "about/changelog", "os": ["linux", "windows"], "date": "2024-08-02"},
{"file": "about/release-notes", "os": ["linux"], "date": "2025-03-04"},
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/install", "os": ["linux"]},
@@ -100,11 +92,6 @@ extensions = ["rocm_docs", "sphinx_reredirects"]
external_projects_current_project = "rocm"
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm-stg.amd.com")
html_context = {}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}

View File

@@ -56,10 +56,6 @@ To make edits to our documentation via PR, follow these steps:
6. Change directory into the `./docs` folder and make any documentation changes locally using your preferred code editor. Follow the guidelines listed on the
[documentation structure](./doc-structure.md) page.
```{note}
Spell checking is performed for pull requests by {doc}`ROCm Docs Core<rocm-docs-core:index>`. To ensure your PR passes spell checking you might need at add new words or acronyms to the `.wordlist.txt` file as described in {doc}`Spell Check<rocm-docs-core:user_guide/spellcheck>`.
```
7. Optionally run a local test build of the documentation to ensure the content builds and looks as expected. In your terminal, run the following commands from within the `./docs` folder of your cloned repository:
```bash

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 113 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

View File

@@ -1,111 +0,0 @@
ROCm Version,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
:doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>`,Ubuntu 24.04,,,,,
,"Ubuntu 22.04.5 [#Ubuntu220405-past-60]_, 22.04.4","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3"
,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.4, 9.3","RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2","RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2","RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
,"RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
,Oracle Linux 8.9 [#oracle89-past-60]_,Oracle Linux 8.9 [#oracle89-past-60]_,Oracle Linux 8.9 [#oracle89-past-60]_,,,
,".. _architecture-support-compatibility-matrix-past-60:",,,,,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
,".. _gpu-support-compatibility-matrix-past-60:",,,,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,
FRAMEWORK SUPPORT,".. _framework-support-compatibility-matrix-past-60:",,,,,
:doc:`PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`,"2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`,"2.16.1, 2.15.1, 2.14.1","2.15, 2.14, 2.13","2.15, 2.14, 2.13","2.15, 2.14, 2.13","2.14, 2.13, 2.12","2.14, 2.13, 2.12"
:doc:`JAX <rocm-install-on-linux:how-to/3rd-party/jax-install>`,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
,,,,,,
THIRD PARTY COMMS,".. _thirdpartycomms-support-compatibility-matrix-past-60:",,,,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.2.0,>=1.2.0,>=1.2.0,>=1.2.0,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
,,,,,,
THIRD PARTY ALGORITHM,".. _thirdpartyalgorithm-support-compatibility-matrix-past-60:",,,,,
Thrust,2.2.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
CUB,2.2.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
,,,,,,
ML & COMPUTER VISION,".. _mllibs-support-compatibility-matrix-past-60:",,,,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.10.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.2.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
:doc:`rocDecode <rocdecode:index>`,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
:doc:`RPP <rpp:index>`,1.8.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
:doc:`rocPyDecode <rocpydecode:index>`,0.1.0,N/A,N/A,N/A,N/A,N/A
,,,,,,
COMMUNICATION,".. _commlibs-support-compatibility-matrix-past-60:",,,,,
:doc:`RCCL <rccl:index>`,2.20.5,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
,,,,,,
MATH LIBS,".. _mathlibs-support-compatibility-matrix-past-60:",,,,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.2.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.8.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
:doc:`hipFORT <hipfort:index>`,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.2.0,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.1.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.1,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.2.0,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.2.0,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.28,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
:doc:`rocRAND <rocrand:index>`,3.1.0,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.26.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.2.0,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.5.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
`Tensile <https://github.com/ROCm/Tensile>`_,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
,,,,,,
PRIMITIVES,".. _primitivelibs-support-compatibility-matrix-past-60:",,,,,
:doc:`hipCUB <hipcub:index>`,3.2.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.3.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.2.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
,,,,,,
SUPPORT LIBS,,,,,,
`hipother <https://github.com/ROCm/hipother>`_,6.2.41133,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.2.0,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
,,,,,,
SYSTEM MGMT TOOLS,".. _tools-support-compatibility-matrix-past-60:",,,,,
:doc:`AMD SMI <amdsmi:index>`,24.6.2,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
:doc:`ROCm Data Center Tool <rdc:index>`,1.0.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.3.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,rocm-6.2.0,rocm-6.1.2,rocm-6.1.1,rocm-6.1.0,rocm-6.0.2,rocm-6.0.0
,,,,,,
PERFORMANCE TOOLS,,,,,,
:doc:`Omniperf <omniperf:index>`,2.0.1,N/A,N/A,N/A,N/A,N/A
:doc:`Omnitrace <omnitrace:index>`,1.11.2,N/A,N/A,N/A,N/A,N/A
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60200,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.4.0,N/A,N/A,N/A,N/A,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60200,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
,,,,,,
DEVELOPMENT TOOLS,,,,,,
:doc:`HIPIFY <hipify:index>`,18.0.0.24232,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.13.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,14.2.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.3.0,0.3.0,0.3.0,N/A,N/A
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
,,,,,,
COMPILERS,".. _compilers-support-compatibility-matrix-past-60:",,,,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
`Flang <https://github.com/ROCm/flang>`_,18.0.0.24232,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
`llvm-project <https://github.com/ROCm/llvm-project>`_,18.0.0.24232,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,18.0.0.24232,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
,,,,,,
RUNTIMES,".. _runtime-support-compatibility-matrix-past-60:",,,,,
:doc:`HIP <hip:index>`,6.2.41133,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
:doc:`ROCR-Runtime <rocr-runtime:index>`,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
1 ROCm Version 6.2.0 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
2 :doc:`Operating Systems <rocm-install-on-linux:reference/system-requirements>` Ubuntu 24.04
3 Ubuntu 22.04.5 [#Ubuntu220405-past-60]_, 22.04.4 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3
4 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5
5 RHEL 9.4, 9.3 RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2 RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2 RHEL 9.4 [#red-hat94-past-60]_, 9.3, 9.2 RHEL 9.3, 9.2 RHEL 9.3, 9.2
6 RHEL 8.10, 8.9 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8
7 SLES 15 SP6, SP5 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4
8 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9
9 Oracle Linux 8.9 [#oracle89-past-60]_ Oracle Linux 8.9 [#oracle89-past-60]_ Oracle Linux 8.9 [#oracle89-past-60]_
10 .. _architecture-support-compatibility-matrix-past-60:
11 :doc:`Architecture <rocm-install-on-linux:reference/system-requirements>` CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3
12 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2
13 CDNA CDNA CDNA CDNA CDNA CDNA
14 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3
15 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2
16 .. _gpu-support-compatibility-matrix-past-60:
17 :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>` gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100
18 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030
19 gfx942 [#mi300_620-past-60]_ gfx942 [#mi300_612-past-60]_ gfx942 [#mi300_611-past-60]_ gfx942 [#mi300_610-past-60]_ gfx942 [#mi300_602-past-60]_ gfx942 [#mi300_600-past-60]_
20 gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a
21 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908
22
23 FRAMEWORK SUPPORT .. _framework-support-compatibility-matrix-past-60:
24 :doc:`PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` 2.3, 2.2, 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13
25 :doc:`TensorFlow <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>` 2.16.1, 2.15.1, 2.14.1 2.15, 2.14, 2.13 2.15, 2.14, 2.13 2.15, 2.14, 2.13 2.14, 2.13, 2.12 2.14, 2.13, 2.12
26 :doc:`JAX <rocm-install-on-linux:how-to/3rd-party/jax-install>` 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26
27 `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_ 1.17.3 1.17.3 1.17.3 1.17.3 1.14.1 1.14.1
28
29 THIRD PARTY COMMS .. _thirdpartycomms-support-compatibility-matrix-past-60:
30 `UCC <https://github.com/ROCm/ucc>`_ >=1.2.0 >=1.2.0 >=1.2.0 >=1.2.0 >=1.2.0 >=1.2.0
31 `UCX <https://github.com/ROCm/ucx>`_ >=1.15.0 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1
32
33 THIRD PARTY ALGORITHM .. _thirdpartyalgorithm-support-compatibility-matrix-past-60:
34 Thrust 2.2.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
35 CUB 2.2.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
36
37 ML & COMPUTER VISION .. _mllibs-support-compatibility-matrix-past-60:
38 :doc:`Composable Kernel <composable_kernel:index>` 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0
39 :doc:`MIGraphX <amdmigraphx:index>` 2.10.0 2.9.0 2.9.0 2.9.0 2.8.0 2.8.0
40 :doc:`MIOpen <miopen:index>` 3.2.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
41 :doc:`MIVisionX <mivisionx:index>` 3.0.0 2.5.0 2.5.0 2.5.0 2.5.0 2.5.0
42 :doc:`rocDecode <rocdecode:index>` 0.6.0 0.6.0 0.5.0 0.5.0 N/A N/A
43 :doc:`RPP <rpp:index>` 1.8.0 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0
44 :doc:`rocPyDecode <rocpydecode:index>` 0.1.0 N/A N/A N/A N/A N/A
45
46 COMMUNICATION .. _commlibs-support-compatibility-matrix-past-60:
47 :doc:`RCCL <rccl:index>` 2.20.5 2.18.6 2.18.6 2.18.6 2.18.3 2.18.3
48
49 MATH LIBS .. _mathlibs-support-compatibility-matrix-past-60:
50 `half <https://github.com/ROCm/half>`_ 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0
51 :doc:`hipBLAS <hipblas:index>` 2.2.0 2.1.0 2.1.0 2.1.0 2.0.0 2.0.0
52 :doc:`hipBLASLt <hipblaslt:index>` 0.8.0 0.7.0 0.7.0 0.7.0 0.6.0 0.6.0
53 :doc:`hipFFT <hipfft:index>` 1.0.14 1.0.14 1.0.14 1.0.14 1.0.13 1.0.13
54 :doc:`hipFORT <hipfort:index>` 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0
55 :doc:`hipRAND <hiprand:index>` 2.11.0 2.10.16 2.10.16 2.10.16 2.10.16 2.10.16
56 :doc:`hipSOLVER <hipsolver:index>` 2.2.0 2.1.1 2.1.1 2.1.0 2.0.0 2.0.0
57 :doc:`hipSPARSE <hipsparse:index>` 3.1.1 3.0.1 3.0.1 3.0.1 3.0.0 3.0.0
58 :doc:`hipSPARSELt <hipsparselt:index>` 0.2.1 0.2.0 0.1.0 0.1.0 0.1.0 0.1.0
59 :doc:`rocALUTION <rocalution:index>` 3.2.0 3.1.1 3.1.1 3.1.1 3.0.3 3.0.3
60 :doc:`rocBLAS <rocblas:index>` 4.2.0 4.1.2 4.1.0 4.1.0 4.0.0 4.0.0
61 :doc:`rocFFT <rocfft:index>` 1.0.28 1.0.27 1.0.27 1.0.26 1.0.25 1.0.23
62 :doc:`rocRAND <rocrand:index>` 3.1.0 3.0.1 3.0.1 3.0.1 3.0.0 2.10.17
63 :doc:`rocSOLVER <rocsolver:index>` 3.26.0 3.25.0 3.25.0 3.25.0 3.24.0 3.24.0
64 :doc:`rocSPARSE <rocsparse:index>` 3.2.0 3.1.2 3.1.2 3.1.2 3.0.2 3.0.2
65 :doc:`rocWMMA <rocwmma:index>` 1.5.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0
66 `Tensile <https://github.com/ROCm/Tensile>`_ 4.40.0 4.40.0 4.40.0 4.40.0 4.39.0 4.39.0
67
68 PRIMITIVES .. _primitivelibs-support-compatibility-matrix-past-60:
69 :doc:`hipCUB <hipcub:index>` 3.2.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
70 :doc:`hipTensor <hiptensor:index>` 1.3.0 1.2.0 1.2.0 1.2.0 1.1.0 1.1.0
71 :doc:`rocPRIM <rocprim:index>` 3.2.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
72 :doc:`rocThrust <rocthrust:index>` 3.0.1 3.0.1 3.0.1 3.0.1 3.0.0 3.0.0
73
74 SUPPORT LIBS
75 `hipother <https://github.com/ROCm/hipother>`_ 6.2.41133 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
76 `rocm-core <https://github.com/ROCm/rocm-core>`_ 6.2.0 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
77 `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_ 20240607.1.4246 20240125.5.08 20240125.5.08 20240125.3.30 20231016.2.245 20231016.2.245
78
79 SYSTEM MGMT TOOLS .. _tools-support-compatibility-matrix-past-60:
80 :doc:`AMD SMI <amdsmi:index>` 24.6.2 24.5.1 24.5.1 24.4.1 23.4.2 23.4.2
81 :doc:`ROCm Data Center Tool <rdc:index>` 1.0.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0
82 :doc:`rocminfo <rocminfo:index>` 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
83 :doc:`ROCm SMI <rocm_smi_lib:index>` 7.3.0 7.2.0 7.0.0 7.0.0 6.0.2 6.0.0
84 :doc:`ROCm Validation Suite <rocmvalidationsuite:index>` rocm-6.2.0 rocm-6.1.2 rocm-6.1.1 rocm-6.1.0 rocm-6.0.2 rocm-6.0.0
85
86 PERFORMANCE TOOLS
87 :doc:`Omniperf <omniperf:index>` 2.0.1 N/A N/A N/A N/A N/A
88 :doc:`Omnitrace <omnitrace:index>` 1.11.2 N/A N/A N/A N/A N/A
89 :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>` 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0 1.4.0
90 :doc:`ROCProfiler <rocprofiler:index>` 2.0.60200 2.0.60102 2.0.60101 2.0.60100 2.0.60002 2.0.60000
91 :doc:`ROCprofiler-SDK <rocprofiler-sdk:index>` 0.4.0 N/A N/A N/A N/A N/A
92 :doc:`ROCTracer <roctracer:index>` 4.1.60200 4.1.60102 4.1.60101 4.1.60100 4.1.60002 4.1.60000
93
94 DEVELOPMENT TOOLS
95 :doc:`HIPIFY <hipify:index>` 18.0.0.24232 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
96 :doc:`ROCm CMake <rocmcmakebuildtools:index>` 0.13.0 0.12.0 0.12.0 0.12.0 0.11.0 0.11.0
97 :doc:`ROCdbgapi <rocdbgapi:index>` 0.76.0 0.71.0 0.71.0 0.71.0 0.71.0 0.71.0
98 :doc:`ROCm Debugger (ROCgdb) <rocgdb:index>` 14.2.0 14.1.0 14.1.0 14.1.0 13.2.0 13.2.0
99 `rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_ 0.4.0 0.3.0 0.3.0 0.3.0 N/A N/A
100 :doc:`ROCr Debug Agent <rocr_debug_agent:index>` 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3 2.0.3
101
102 COMPILERS .. _compilers-support-compatibility-matrix-past-60:
103 `clang-ocl <https://github.com/ROCm/clang-ocl>`_ N/A 0.5.0 0.5.0 0.5.0 0.5.0 0.5.0
104 `Flang <https://github.com/ROCm/flang>`_ 18.0.0.24232 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
105 `llvm-project <https://github.com/ROCm/llvm-project>`_ 18.0.0.24232 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
106 `OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_ 18.0.0.24232 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
107
108 RUNTIMES .. _runtime-support-compatibility-matrix-past-60:
109 :doc:`HIP <hip:index>` 6.2.41133 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
110 `OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_ 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0
111 :doc:`ROCR-Runtime <rocr-runtime:index>` 1.13.0 1.13.0 1.13.0 1.13.0 1.12.0 1.12.0

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 288 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

View File

@@ -1,23 +0,0 @@
.. meta::
:description: Build ROCm from source
:keywords: build ROCm, source, ROCm source, ROCm, repo, make, makefile
.. _building-rocm:
*************************************************************
Build ROCm from source
*************************************************************
ROCm is an open-source stack from which you can build from source code. The source code is available from `<https://github.com/ROCm/ROCm>`__.
The general steps to build ROCm are:
#. Clone the ROCm source code
#. Prepare the build environment
#. Run the build command
Because the ROCm stack is constantly evolving, the most current instructions are stored with the source code in GitHub.
For detailed build instructions, see `Build ROCm from source <https://github.com/ROCm/ROCm?tab=readme-ov-file#build-rocm-from-source>`_

View File

@@ -13,9 +13,9 @@ frameworks to ensure that framework-specific optimizations take advantage of AMD
The following guides cover installation processes for ROCm-aware deep learning frameworks.
* :doc:`PyTorch for ROCm <rocm-install-on-linux:install/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
* :doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.

View File

@@ -250,4 +250,4 @@ page describes the options.
:align: center
Learn more about optimizing kernels with TunableOp in
:ref:`Optimizing Triton kernels <mi300x-tunableop>`.
:ref:`Optimizing Triton kernels <fine-tuning-llms-triton-tunableop>`.

View File

@@ -37,14 +37,14 @@ Setting up the base implementation environment
----------------------------------------------
#. Install PyTorch for ROCm. Refer to the
:doc:`PyTorch installation guide <rocm-install-on-linux:install/3rd-party/pytorch-install>`. For consistent
:doc:`PyTorch installation guide <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`. For consistent
installation, its recommended to use official ROCm prebuilt Docker images with the framework pre-installed.
#. In the Docker container, check the availability of ROCM-capable accelerators using the following command.
.. code-block:: shell
rocm-smi --showproductname
rocm-smi -showproductname
#. Check that your accelerators are available to PyTorch.
@@ -95,7 +95,7 @@ Now, it's important to adjust how you load the model. Add the ``device_map`` par
# Load base model to GPU memory
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map = "auto",
device_map = "auto"
trust_remote_code = True)
...
# Run training

View File

@@ -38,7 +38,7 @@ Setting up the base implementation environment
----------------------------------------------
#. Install PyTorch for ROCm. Refer to the
:doc:`PyTorch installation guide <rocm-install-on-linux:install/3rd-party/pytorch-install>`. For a consistent
:doc:`PyTorch installation guide <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`. For a consistent
installation, its recommended to use official ROCm prebuilt Docker images with the framework pre-installed.
#. In the Docker container, check the availability of ROCm-capable accelerators using the following command.
@@ -103,7 +103,7 @@ Setting up the base implementation environment
pip install peft
# Install the other dependencies.
pip install transformers datasets huggingface-hub scipy
pip install transformers, datasets, huggingface-hub, scipy
#. Check that the required packages can be imported.

View File

@@ -16,10 +16,10 @@ Before getting started, install ROCm and supported machine learning frameworks.
Each release of ROCm supports specific hardware and software configurations. Before installing, consult the
:doc:`System requirements <rocm-install-on-linux:reference/system-requirements>` and
:doc:`Installation prerequisites <rocm-install-on-linux:install/prerequisites>` guides.
:doc:`Installation prerequisites <rocm-install-on-linux:how-to/prerequisites>` guides.
If youre new to ROCm, refer to the :doc:`ROCm quick start install guide for Linux
<rocm-install-on-linux:install/quick-start>`.
<rocm-install-on-linux:tutorial/quick-start>`.
If youre using a Radeon GPU for graphics-accelerated applications, refer to the
:doc:`Radeon installation instructions <radeon:docs/install/install-radeon>`.
@@ -53,10 +53,8 @@ ROCm supports popular machine learning frameworks and libraries including `PyTor
Review the framework installation documentation. For ease-of-use, it's recommended to use official ROCm prebuilt Docker
images with the framework pre-installed.
* :doc:`PyTorch for ROCm <rocm-install-on-linux:install/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
* :doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
The sections that follow in :doc:`Training a model <train-a-model>` are geared for a ROCm with PyTorch installation.

View File

@@ -65,12 +65,6 @@ their own performance testing for additional tuning.
- `CDNA 3 architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf>`_
* - :doc:`AMD Instinct MI300A <mi300a>`
- `AMD Instinct MI300 instruction set architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf>`_
- `CDNA 3 architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf>`_
* - :doc:`AMD Instinct MI200 <mi200>`
- `AMD Instinct MI200 instruction set architecture <https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf>`_

View File

@@ -1,393 +0,0 @@
.. meta::
:description: AMD Instinct MI300A system settings
:keywords: AMD, Instinct, MI300A, HPC, tuning, BIOS settings, NBIO, ROCm,
environment variable, performance, accelerator, GPU, EPYC, GRUB,
operating system
***************************************************
AMD Instinct MI300A system optimization
***************************************************
This topic discusses the operating system settings and system management commands for
the AMD Instinct MI300A accelerator. This topic can help you optimize performance.
System settings
========================================
This section reviews the system settings required to configure a MI300A SOC system and
optimize its performance.
The MI300A system-on-a-chip (SOC) design requires you to review and potentially adjust your OS configuration as explained in
the :ref:`operating-system-settings-label` section. These settings are critical for
performance because the OS on an accelerated processing unit (APU) is responsible for memory management across the CPU and GPU accelerators.
In the APU memory model, system settings are available to limit GPU memory allocation.
This limit is important because legacy software often determines the
amount of allowable memory at start-up time
by probing discrete memory until it is exhausted. If left unchecked, this practice
can starve the OS of resources.
System BIOS settings
-----------------------------------
System BIOS settings are preconfigured for optimal performance from the
platform vendor. This means that you do not need to adjust these settings
when using MI300A. If you have any questions regarding these settings,
contact your MI300A platform vendor.
GRUB settings
-----------------------------------
The ``/etc/default/grub`` file is used to configure the GRUB bootloader on modern Linux distributions.
Linux uses the string assigned to ``GRUB_CMDLINE_LINUX`` in this file as
its command line parameters during boot.
Appending strings using the Linux command line
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is recommended that you append the following string to ``GRUB_CMDLINE_LINUX``.
``pci=realloc=off``
This setting disables the automatic reallocation
of PCI resources, so Linux is able to unambiguously detect all GPUs on the
MI300A-based system. It's used when Single Root I/O Virtualization (SR-IOV) Base
Address Registers (BARs) have not been allocated by the BIOS. This can help
avoid potential issues with certain hardware configurations.
Validating the IOMMU setting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IOMMU is a system-specific IO mapping mechanism for DMA mapping
and isolation. IOMMU is turned off by default in the operating system settings
for optimal performance.
To verify IOMMU is turned off, first install the ``acpica-tools`` package using your
package manager.
.. code-block:: shell
sudo apt install acpica-tools
Then confirm that the following commands do not return any results.
.. code-block:: shell
sudo acpidump | grep IVRS
sudo acpidump | grep DMAR
Update GRUB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use this command to update GRUB to use the modified configuration:
.. code-block:: shell
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
On some Red Hat-based systems, the ``grub2-mkconfig`` command might not be available. In this case,
use ``grub-mkconfig`` instead. Verify that you have the
correct version by using the following command:
.. code-block:: shell
grub-mkconfig -version
.. _operating-system-settings-label:
Operating system settings
-----------------------------------
The operating system provides several options to customize and tune performance. For more information
about supported operating systems, see the :doc:`Compatibility matrix <../../compatibility/compatibility-matrix>`.
If you are using a distribution other than RHEL or SLES, the latest Linux kernel is recommended.
Performance considerations for the Zen4, which is the core architecture in the MI300A,
require a Linux kernel running version 5.18 or higher.
This section describes performance-based settings.
* **Enable transparent huge pages**
To enable transparent huge pages, use one of the following methods:
* From the command line, run the following command:
.. code-block:: shell
echo always > /sys/kernel/mm/transparent_hugepage/enabled
* Set the Linux kernel parameter ``transparent_hugepage`` as follows in the
relevant ``.cfg`` file for your system.
.. code-block:: cfg
transparent_hugepage=always
* **Limit the maximum and single memory allocations on the GPU**
Many AI-related applications were originally developed on discrete GPUs. Some of these applications
have fixed problem sizes associated with the targeted GPU size, and some attempt to determine the
system memory limits by allocating chunks until failure. These techniques can cause issues in an
APU with a shared space.
To allow these applications to run on the APU without further changes,
ROCm supports a default memory policy that restricts the percentage of the GPU that can be allocated.
The following environment variables control this feature:
* ``GPU_MAX_ALLOC_PERCENT``
* ``GPU_SINGLE_ALLOC_PERCENT``
These settings can be added to the default shell environment or the user environment. The effect of the memory allocation
settings varies depending on the system, configuration, and task. They might require adjustment, especially when performing GPU benchmarks. Setting these values to ``100``
lets the GPU allocate any amount of free memory. However, the risk of encountering
an operating system out-of-memory (OMM) condition increases when almost
all the available memory is used.
Before setting either of these items to 100 percent,
carefully consider the expected CPU workload allocation and the anticipated OS usage.
For instance, if the OS requires 8GB on a 128GB system, setting these
variables to ``100`` authorizes a single
workload to allocate up to 120GB of memory. Unless the system has swap space configured
any over-allocation attempts will be handled by the OMM policies.
* **Disable NUMA (Non-uniform memory access) balancing**
ROCm uses information from the compiled application to ensure an affinity exists
between the GPU agent processes and their CPU hosts or co-processing agents.
Because the APU has OS threads,
including threads with memory management, the default kernel NUMA policies can
adversely impact workload performance without additional tuning.
.. note::
At the kernel level, ``pci_relloc`` can also be set to ``off`` as an additional tuning measure.
To disable NUMA balancing, use one of the following methods:
* From the command line, run the following command:
.. code-block:: shell
echo 0 > /proc/sys/kernel/numa_balancing
* Set the following Linux kernel parameters in the
relevant ``.cfg`` file for your system.
.. code-block:: cfg
pci=realloc=off numa_balancing=disable
* **Enable compaction**
Compaction is necessary for proper MI300A operation because the APU dynamically shares memory
between the CPU and GPU. Compaction can be done proactively, which reduces
allocation costs, or performed during allocation, in which case it is part of the background activities.
Without compaction, the MI300A application performance eventually degrades as fragmentation increases.
In RHEL distributions, compaction is disabled by default. In Ubuntu, it's enabled by default.
To enable compaction, enter the following commands using the command line:
.. code-block:: shell
echo 20 > /proc/sys/vm/compaction_proactiveness
echo 1 > /proc/sys/vm/compact_unevictable_allowed
.. _mi300a-processor-affinity:
* **Change affinity of ROCm helper threads**
This change prevents internal ROCm threads from having their CPU core affinity mask
set to all CPU cores available. With this setting, the threads inherit their parent's
CPU core affinity mask. If you have any questions regarding this setting,
contact your MI300A platform vendor. To enable this setting, enter the following command:
.. code-block:: shell
export HSA_OVERRIDE_CPU_AFFINITY_DEBUG=0
* **CPU core states and C-states**
The system BIOS handles these settings for the MI300A.
They don't need to be configured on the operating system.
System management
========================================
For a complete guide on installing, managing, and uninstalling ROCm on Linux, see
:doc:`Quick-start (Linux)<rocm-install-on-linux:tutorial/quick-start>`. To verify that the
installation was successful, see the
:doc:`Post-installation instructions<rocm-install-on-linux:install/native-install/post-install>` and
:doc:`ROCm tools <../../reference/rocm-tools>` guides. If verification
fails, consult the :doc:`System debugging guide <../system-debugging>`.
.. _hw-verification-rocm-label:
Hardware verification with ROCm
-----------------------------------
ROCm includes tools to query the system structure. To query
the GPU hardware, use the ``rocm-smi`` command.
``rocm-smi`` reports statistics per socket, so the power results combine CPU and GPU utilization.
In an idle state on a multi-socket system, some power imbalances are expected because
the distribution of OS threads can keep some APU devices at higher power states.
.. note::
The MI300A VRAM settings show as ``N/A``.
.. image:: ../../data/how-to/tuning-guides/mi300a-rocm-smi-output.png
:alt: Output from the rocm-smi command
The ``rocm-smi --showhw`` command shows the available system
GPUs and their device ID and firmware details.
In the MI300A hardware settings, the system BIOS handles the UMC RAS. The
ROCm-supplied GPU driver does not manage this setting.
This results in a value of ``DISABLED`` for the ``UMC RAS`` setting.
.. image:: ../../data/how-to/tuning-guides/mi300a-rocm-smi-showhw-output.png
:alt: Output from the ``rocm-smi showhw`` command
To see the system structure, the localization of the GPUs in the system, and the
fabric connections between the system components, use the ``rocm-smi --showtopo`` command.
* The first block of the output shows the distance between the GPUs. The weight is a qualitative
measure of the “distance” data must travel to reach one GPU from another.
While the values do not have a precise physical meaning, the higher the value the
more hops are required to reach the destination from the source GPU.
* The second block contains a matrix named “Hops between two GPUs”, where ``1`` means
the two GPUs are directly connected with XGMI, ``2`` means both GPUs are linked to the
same CPU socket and GPU communications go through the CPU, and ``3`` means
both GPUs are linked to different CPU sockets so communications go
through both CPU sockets.
* The third block indicates the link types between the GPUs. This can either be
``XGMI`` for AMD Infinity Fabric links or ``PCIE`` for PCIe Gen4 links.
* The fourth block reveals the localization of a GPU with respect to the NUMA organization
of the shared memory of the AMD EPYC processors.
.. image:: ../../data/how-to/tuning-guides/mi300a-rocm-smi-showtopo-output.png
:alt: Output from the ``rocm-smi showtopo`` command
Testing inter-device bandwidth
-----------------------------------
The ``rocm-smi --showtopo`` command from the :ref:`hw-verification-rocm-label` section
displays the system structure and shows how the GPUs are located and connected within this
structure. For more information, use the :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`, which can run benchmarks to
show the effective link bandwidth between the system components.
For information on how to install the ROCm Bandwidth Test, see :doc:`Building the environment <rocm_bandwidth_test:install/install>`.
The output lists the available compute devices (CPUs and GPUs), including
their device ID and PCIe ID:
.. image:: ../../data/how-to/tuning-guides/mi300a-rocm-bandwidth-test-output.png
:alt: Output from the rocm-bandwidth-test utility
It also displays the measured bandwidth for unidirectional and
bidirectional transfers between the devices on the CPU and GPU:
.. image:: ../../data/how-to/tuning-guides/mi300a-rocm-peak-bandwidth-output.png
:alt: Bandwidth information from the rocm-bandwidth-test utility
Abbreviations
=============
APBDIS
Algorithmic Performance Boost Disable
APU
Accelerated processing unit
BAR
Base Address Register
BIOS
Basic Input/Output System
CBS
Common BIOS Settings
CCD
Compute Core Die
CDNA
Compute DNA
CLI
Command Line Interface
CPU
Central Processing Unit
cTDP
Configurable Thermal Design Power
DF
Data Fabric
DMA
Direct Memory Access
GPU
Graphics Processing Unit
GRUB
Grand Unified Bootloader
HBM
High Bandwidth Memory
HPC
High Performance Computing
IOMMU
Input-Output Memory Management Unit
ISA
Instruction Set Architecture
NBIO
North Bridge Input/Output
NUMA
Non-Uniform Memory Access
OMM
Out of Memory
PCI
Peripheral Component Interconnect
PCIe
PCI Express
POR
Power-On Reset
RAS
Reliability, availability and serviceability
SMI
System Management Interface
SMT
Simultaneous Multi-threading
SOC
System On Chip
SR-IOV
Single Root I/O Virtualization
TSME
Transparent Secure Memory Encryption
UMC
Unified Memory Controller
VRAM
Video RAM
xGMI
Inter-chip Global Memory Interconnect

View File

@@ -473,18 +473,6 @@ It is recommended to set the following environment variable:
This is the default option as of ROCm 6.2.
Change affinity of ROCm helper threads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This change prevents internal ROCm threads from having their CPU core affinity mask
set to all CPU cores available. With this setting, the threads inherit their parent's
CPU core affinity mask. If you have any questions regarding this setting,
contact your MI300A platform vendor. To enable this setting, enter the following command:
.. code-block:: shell
export HSA_OVERRIDE_CPU_AFFINITY_DEBUG=0
IOMMU configuration -- systems with 256 CPU threads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -532,9 +520,9 @@ can provide system-level information, offering valuable insights for
optimizing user applications.
For a complete guide on how to install, manage, or uninstall ROCm on Linux, refer to
:doc:`rocm-install-on-linux:install/quick-start`. For verifying that the
:doc:`rocm-install-on-linux:tutorial/quick-start`. For verifying that the
installation was successful, refer to the
:doc:`rocm-install-on-linux:install/native-install/post-install`.
:doc:`rocm-install-on-linux:how-to/native-install/post-install`.
Should verification fail, consult :doc:`/how-to/system-debugging`.
Hardware verification with ROCm
@@ -707,8 +695,8 @@ Bidirectional bandwidth
Bidirectional bandwidth
Abbreviations
=============
Acronyms
========
AMI
American Megatrends International

View File

@@ -12,7 +12,7 @@
This chapter reviews system settings that are required to configure the system
for ROCm virtualization on RDNA2-based AMD Radeon™ PRO GPUs. Installing ROCm on
Bare Metal follows the routine ROCm
{doc}`installation procedure<rocm-install-on-linux:install/native-install/index>`.
{doc}`installation procedure<rocm-install-on-linux:how-to/native-install/index>`.
To enable ROCm virtualization on V620, one has to setup Single Root I/O
Virtualization (SR-IOV) in the BIOS via setting found in the following
@@ -166,4 +166,4 @@ First, assign GPU virtual function (VF) to VM using the following steps.
Then start the VM.
Finally install ROCm on the virtual machine (VM). For detailed instructions,
refer to the {doc}`Linux install guide<rocm-install-on-linux:install/native-install/index>`.
refer to the {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`.

View File

@@ -162,12 +162,12 @@ tools available depending on their specific profiling needs.
* ROCProfiler tool collects kernel execution performance
metrics. For more information, see the
:doc:`ROCProfiler <rocprofiler:index>`
`ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprofv1.html>`_
documentation.
* Omniperf builds upon ROCProfiler but provides more guided analysis.
For more information, see
:doc:`Omniperf documentation <omniperf:index>`.
`Omniperf documentation <https://rocm.github.io/omniperf/>`_.
Refer to :doc:`/how-to/llm-fine-tuning-optimization/profiling-and-debugging`
to explore commonly used profiling tools and their usage patterns.

View File

@@ -5,45 +5,81 @@
reference, ROCm, AMD">
</head>
# AMD ROCm documentation
# AMD ROCm documentation
ROCm is an open-source software platform optimized to extract HPC and AI workload
performance from AMD Instinct accelerators and AMD Radeon GPUs while maintaining
compatibility with industry software frameworks. For more information, see [What is ROCm?](./what-is-rocm.rst)
Welcome to the ROCm docs home page! If you're new to ROCm, you can review the following
resources to learn more about our products and what we support:
If you're using Radeon GPUs, consider reviewing {doc}`Radeon-specific ROCm documentation<radeon:index>`.
* [What is ROCm?](./what-is-rocm.rst)
* [Release notes](./about/release-notes.md)
Installation instructions are available from:
You can install ROCm on our Radeon™, Radeon™ PRO, and Instinct™ GPUs. If you're using Radeon
GPUs, we recommend reading the
{doc}`Radeon-specific ROCm documentation<radeon:index>`.
* {doc}`ROCm installation for Linux<rocm-install-on-linux:index>`
* {doc}`HIP SDK installation for Windows<rocm-install-on-windows:index>`
* [Deep learning frameworks installation](./how-to/deep-learning-rocm.rst)
* [Build ROCm from source](./how-to/build-rocm.rst)
For hands-on applications, refer to our [ROCm blogs](https://rocm.blogs.amd.com/) site.
ROCm documentation is organized into the following categories:
Our documentation is organized into the following categories:
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-installation.jpg
:img-alt: Install documentation
:padding: 2
* Linux
* {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`
* {doc}`Package manager integration<rocm-install-on-linux:how-to/native-install/package-manager-integration>`
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
* Windows
* {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
* {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
* [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
:::
:::{grid-item-card}
:img-top: ./data/banner-compatibility.jpg
:img-alt: Compatibility information
:padding: 2
* [Compatibility matrix](./compatibility/compatibility-matrix.rst)
* {doc}`Linux system requirements<rocm-install-on-linux:reference/system-requirements>`
* {doc}`Windows system requirements<rocm-install-on-windows:reference/system-requirements>`
* {doc}`System requirements (Linux)<rocm-install-on-linux:reference/system-requirements>`
* {doc}`System requirements (Windows)<rocm-install-on-windows:reference/system-requirements>`
* {doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`
* {doc}`User/kernel space<rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`
* {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
* [OpenMP](./about/compatibility/openmp.md)
* [Precision support](./compatibility/precision-support.rst)
* {doc}`ROCm on Radeon GPUs<radeon:index>`
:::
<!-- markdownlint-disable MD051 -->
:::{grid-item-card}
:img-top: ./data/banner-reference.jpg
:img-alt: Reference documentation
:padding: 2
* [API libraries](./reference/api-libraries.md)
* [Artificial intelligence](#artificial-intelligence-apis)
* [C++ primitives](#cpp-primitives)
* [Communication](#communication-libraries)
* [Math](#math-apis)
* [Random number generators](#random-number-apis)
* [HIP runtime](#hip-runtime)
* [Tools](./reference/rocm-tools.md)
* [Development](#development-tools)
* [Performance analysis](#performance-tools)
* [System](#system-tools)
* [Hardware specifications](./reference/gpu-arch-specs.rst)
:::
<!-- markdownlint-enable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-howto.jpg
:img-alt: How-to documentation
:padding: 2
@@ -53,7 +89,6 @@ ROCm documentation is organized into the following categories:
* [Fine-tuning LLMs and inference optimization](./how-to/llm-fine-tuning-optimization/index.rst)
* [System optimization](./how-to/system-optimization/index.rst)
* [AMD Instinct MI300X](./how-to/system-optimization/mi300x.rst)
* [AMD Instinct MI300A](./how-to/system-optimization/mi300a.rst)
* [AMD Instinct MI200](./how-to/system-optimization/mi200.md)
* [AMD Instinct MI100](./how-to/system-optimization/mi100.md)
* [AMD Instinct RDNA2](./how-to/system-optimization/w6000-v620.md)
@@ -62,18 +97,23 @@ ROCm documentation is organized into the following categories:
* [Workload tuning](./how-to/tuning-guides/mi300x/workload.rst)
* [System debugging](./how-to/system-debugging.md)
* [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
* [Using advanced compiler features](./conceptual/compiler-topics.md)
* [Using compiler features](./conceptual/compiler-topics.md)
* [Using AddressSanitizer](./conceptual/using-gpu-sanitizer.md)
* [Compiler disambiguation](./conceptual/compiler-disambiguation.md)
* [OpenMP support in ROCm](./about/compatibility/openmp.md)
* [Setting the number of CUs](./how-to/setting-cus)
* [GitHub examples](https://github.com/amd/rocm-examples)
:::
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-conceptual.jpg
:img-alt: Conceptual documentation
:padding: 2
* [GPU architecture](./conceptual/gpu-arch.md)
* [MI100](./conceptual/gpu-arch/mi100.md)
* [MI250](./conceptual/gpu-arch/mi250.md)
* [MI300](./conceptual/gpu-arch/mi300.md)
* [GPU memory](./conceptual/gpu-memory.md)
* [File structure (Linux FHS)](./conceptual/file-reorg.md)
* [GPU isolation techniques](./conceptual/gpu-isolation.md)
@@ -83,23 +123,4 @@ ROCm documentation is organized into the following categories:
* [Inference optimization with MIGraphX](./conceptual/ai-migraphx-optimization.md)
:::
<!-- markdownlint-disable MD051 -->
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ./data/banner-reference.jpg
:img-alt: Reference documentation
:padding: 2
* [Libraries](./reference/api-libraries.md)
* [Artificial intelligence](#artificial-intelligence-apis)
* [C++ primitives](#cpp-primitives)
* [Communication](#communication-libraries)
* [Math](#math-apis)
* [Random number generators](#random-number-apis)
* [HIP runtime](#hip-runtime)
* [ROCm tools and compilers](./reference/rocm-tools.md)
* [GPU hardware specifications](./reference/gpu-arch-specs.rst)
:::
<!-- markdownlint-enable MD051 -->
::::

View File

@@ -6,7 +6,7 @@
algebra, AMD">
</head>
# ROCm libraries
# ROCm API libraries
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid

View File

@@ -6,24 +6,24 @@
algebra, AMD">
</head>
# ROCm tools, compilers, and runtimes
# ROCm tools
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid
(system-tools)=
(development-tools)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-system.jpg
:img-alt: System tools
:img-top: ../data/reference/banner-development.jpg
:img-alt: Development tools
:padding: 2
* {doc}`AMD SMI <amdsmi:index>`
* {doc}`ROCm Data Center Tool <rdc:index>`
* {doc}`rocminfo <rocminfo:index>`
* {doc}`ROCm SMI <rocm_smi_lib:index>`
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
* {doc}`HIPIFY <hipify:index>`
* {doc}`ROCdbgapi <rocdbgapi:index>`
* [ROCmCC](./rocmcc.md)
* {doc}`ROCm Debugger (ROCgdb) <rocgdb:index>`
* {doc}`ROCr Debug Agent <rocr_debug_agent:index>`
:::
(performance-tools)=
@@ -34,53 +34,25 @@
:img-alt: Performance tools
:padding: 2
* {doc}`Omniperf <omniperf:index>`
* {doc}`Omnitrace <omnitrace:index>`
* {doc}`ROCm Bandwidth Test <rocm_bandwidth_test:index>`
* {doc}`ROCProfiler <rocprofiler:index>`
* {doc}`ROCprofiler-SDK <rocprofiler-sdk:index>`
* {doc}`ROCProfiler <rocprofiler:profiler_home_page>`
* [rocprofiler-register](https://github.com/ROCm/rocprofiler-register)
* {doc}`ROCTracer <roctracer:index>`
:::
(development-tools)=
(system-tools)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-development.jpg
:img-alt: Development tools
:img-top: ../data/reference/banner-system.jpg
:img-alt: System tools
:padding: 2
* {doc}`ROCm CMake <rocmcmakebuildtools:index>`
* {doc}`HIPIFY <hipify:index>`
* {doc}`ROCdbgapi <rocdbgapi:index>`
* {doc}`ROCm Debugger (ROCgdb) <rocgdb:index>`
* {doc}`ROCr Debug Agent <rocr_debug_agent:index>`
:::
(compilers)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-compilers.jpg
:img-alt: Compilers
:padding: 2
* {doc}`ROCm Compilers <llvm-project:index>`
* {doc}`HIPCC <hipcc:index>`
* [FLANG](https://github.com/ROCm/flang/)
:::
(runtimes)=
:::{grid-item-card}
:class-card: sd-text-black
:img-top: ../data/reference/banner-runtimes.jpg
:img-alt: Runtimes
:padding: 2
* {doc}`AMD Common Language Runtime (CLR) <hip:understand/amd_clr>`
* {doc}`HIP <hip:index>`
* {doc}`ROCR-Runtime <rocr-runtime:index>`
* {doc}`AMD SMI <amdsmi:index>`
* {doc}`rocminfo <rocminfo:index>`
* {doc}`ROCm Data Center Tool <rdc:index>`
* {doc}`ROCm SMI <rocm_smi_lib:index>`
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
:::
::::

1450
docs/reference/rocmcc.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -8,7 +8,15 @@
| Version | Release date |
| ------- | ------------ |
| [6.3.3](https://rocm.docs.amd.com/en/docs-6.3.3/) | February 19, 2025 |
| [6.3.2](https://rocm.docs.amd.com/en/docs-6.3.2/) | January 28, 2025 |
| [6.3.1](https://rocm.docs.amd.com/en/docs-6.3.1/) | December 20, 2024 |
| [6.3.0](https://rocm.docs.amd.com/en/docs-6.3.0/) | December 3, 2024 |
| [6.2.4](https://rocm.docs.amd.com/en/docs-6.2.4/) | November 6, 2024 |
| [6.2.2](https://rocm.docs.amd.com/en/docs-6.2.2/) | September 27, 2024 |
| [6.2.1](https://rocm.docs.amd.com/en/docs-6.2.1/) | September 20, 2024 |
| [6.2.0](https://rocm.docs.amd.com/en/docs-6.2.0/) | August 2, 2024 |
| [6.1.5](https://rocm.docs.amd.com/en/docs-6.1.2/) | March 13, 2025 |
| [6.1.2](https://rocm.docs.amd.com/en/docs-6.1.2/) | June 4, 2024 |
| [6.1.1](https://rocm.docs.amd.com/en/docs-6.1.1/) | May 8, 2024 |
| [6.1.0](https://rocm.docs.amd.com/en/docs-6.1.0/) | Apr 16, 2024 |

View File

@@ -9,6 +9,8 @@ subtrees:
- file: what-is-rocm.rst
- file: about/release-notes.md
title: Release notes
- url: https://github.com/ROCm/ROCm/labels/Verified%20Issue
title: Known issues
- caption: Install
entries:
@@ -18,8 +20,28 @@ subtrees:
title: HIP SDK on Windows
- file: how-to/deep-learning-rocm.md
title: Deep learning frameworks
- file: how-to/build-rocm.rst
title: Build ROCm from source
- caption: Compatibility
entries:
- file: compatibility/compatibility-matrix.rst
title: Compatibility matrix
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html
title: Linux
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html
title: Windows
- file: compatibility/precision-support.rst
title: Precision support
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/3rd-party-support-matrix.html
title: Third-party
- caption: Reference
entries:
- file: reference/api-libraries.md
title: API libraries
- file: reference/rocm-tools.md
title: Tools
- file: reference/gpu-arch-specs.rst
title: Hardware specifications
- caption: How to
entries:
@@ -61,8 +83,6 @@ subtrees:
- entries:
- file: how-to/system-optimization/mi300x.rst
title: AMD Instinct MI300X
- file: how-to/system-optimization/mi300a.rst
title: AMD Instinct MI300A
- file: how-to/system-optimization/mi200.md
title: AMD Instinct MI200
- file: how-to/system-optimization/mi100.md
@@ -81,32 +101,19 @@ subtrees:
- file: how-to/gpu-enabled-mpi.rst
title: Using MPI
- file: conceptual/compiler-topics.md
title: Using advanced compiler features
title: Using compiler features
subtrees:
- entries:
- url: https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html
title: ROCm compiler infrastructure
- url: https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/using-gpu-sanitizer.html
- file: conceptual/using-gpu-sanitizer.md
title: Using AddressSanitizer
- url: https://rocm.docs.amd.com/projects/llvm-project/en/latest/conceptual/openmp.html
- file: conceptual/compiler-disambiguation.md
title: Compiler disambiguation
- file: about/compatibility/openmp.md
title: OpenMP support
- file: how-to/setting-cus
title: Setting the number of CUs
- url: https://github.com/amd/rocm-examples
title: ROCm examples
- caption: Compatibility
entries:
- file: compatibility/compatibility-matrix.rst
title: Compatibility matrix
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html
title: Linux
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html
title: Windows
- file: compatibility/precision-support.rst
title: Precision support
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/3rd-party-support-matrix.html
title: Third-party
title: GitHub examples
- caption: Conceptual
entries:
@@ -155,15 +162,6 @@ subtrees:
- file: conceptual/ai-migraphx-optimization.md
title: Inference optimization with MIGraphX
- caption: Reference
entries:
- file: reference/api-libraries.md
title: ROCm libraries
- file: reference/rocm-tools.md
title: ROCm tools, compilers, and runtimes
- file: reference/gpu-arch-specs.rst
title: Hardware specifications
- caption: Contribute
entries:
- file: contribute/contributing.md

View File

@@ -1,2 +1,2 @@
rocm-docs-core==1.6.2
rocm-docs-core==1.18.1
sphinx-reredirects

View File

@@ -6,103 +6,205 @@
#
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==0.7.16
alabaster==1.0.0
# via sphinx
babel==2.15.0
asttokens==3.0.0
# via stack-data
attrs==25.3.0
# via
# jsonschema
# jupyter-cache
# referencing
babel==2.17.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.12.3
beautifulsoup4==4.13.3
# via pydata-sphinx-theme
breathe==4.35.0
breathe==4.36.0
# via rocm-docs-core
certifi==2024.7.4
certifi==2025.1.31
# via requests
cffi==1.16.0
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==3.3.2
charset-normalizer==3.4.1
# via requests
click==8.1.7
# via sphinx-external-toc
cryptography==42.0.8
click==8.1.8
# via
# jupyter-cache
# sphinx-external-toc
comm==0.2.2
# via ipykernel
cryptography==44.0.2
# via pyjwt
deprecated==1.2.14
debugpy==1.8.13
# via ipykernel
decorator==5.2.1
# via ipython
deprecated==1.2.18
# via pygithub
docutils==0.21.2
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.20.0
# via rocm-docs-core
gitdb==4.0.11
exceptiongroup==1.2.2
# via ipython
executing==2.2.0
# via stack-data
fastjsonschema==2.21.1
# via
# nbformat
# rocm-docs-core
gitdb==4.0.12
# via gitpython
gitpython==3.1.43
gitpython==3.1.44
# via rocm-docs-core
idna==3.7
greenlet==3.1.1
# via sqlalchemy
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
jinja2==3.1.4
importlib-metadata==8.6.1
# via
# jupyter-cache
# myst-nb
ipykernel==6.29.5
# via myst-nb
ipython==8.34.0
# via
# ipykernel
# myst-nb
jedi==0.19.2
# via ipython
jinja2==3.1.6
# via
# myst-parser
# sphinx
jsonschema==4.23.0
# via nbformat
jsonschema-specifications==2024.10.1
# via jsonschema
jupyter-cache==1.0.1
# via myst-nb
jupyter-client==8.6.3
# via
# ipykernel
# nbclient
jupyter-core==5.7.2
# via
# ipykernel
# jupyter-client
# nbclient
# nbformat
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.5
markupsafe==3.0.2
# via jinja2
mdit-py-plugins==0.4.1
matplotlib-inline==0.1.7
# via
# ipykernel
# ipython
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser==3.0.1
myst-nb==1.2.0
# via rocm-docs-core
packaging==24.1
myst-parser==4.0.1
# via myst-nb
nbclient==0.10.2
# via
# jupyter-cache
# myst-nb
nbformat==5.10.4
# via
# jupyter-cache
# myst-nb
# nbclient
nest-asyncio==1.6.0
# via ipykernel
packaging==24.2
# via
# ipykernel
# pydata-sphinx-theme
# sphinx
parso==0.8.4
# via jedi
pexpect==4.9.0
# via ipython
platformdirs==4.3.6
# via jupyter-core
prompt-toolkit==3.0.50
# via ipython
psutil==7.0.0
# via ipykernel
ptyprocess==0.7.0
# via pexpect
pure-eval==0.2.3
# via stack-data
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==2.3.0
pygithub==2.6.1
# via rocm-docs-core
pygments==2.18.0
pygments==2.19.1
# via
# accessible-pygments
# ipython
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.8.0
pyjwt[crypto]==2.10.1
# via pygithub
pynacl==1.5.0
# via pygithub
pyyaml==6.0.1
python-dateutil==2.9.0.post0
# via jupyter-client
pyyaml==6.0.2
# via
# jupyter-cache
# myst-nb
# myst-parser
# rocm-docs-core
# sphinx-external-toc
pyzmq==26.3.0
# via
# ipykernel
# jupyter-client
referencing==0.36.2
# via
# jsonschema
# jsonschema-specifications
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core==1.6.2
rocm-docs-core==1.18.1
# via -r requirements.in
smmap==5.0.1
rpds-py==0.23.1
# via
# jsonschema
# referencing
six==1.17.0
# via python-dateutil
smmap==5.0.2
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.5
soupsieve==2.6
# via beautifulsoup4
sphinx==7.3.7
sphinx==8.1.3
# via
# breathe
# myst-nb
# myst-parser
# pydata-sphinx-theme
# rocm-docs-core
@@ -112,39 +214,68 @@ sphinx==7.3.7
# sphinx-external-toc
# sphinx-notfound-page
# sphinx-reredirects
sphinx-book-theme==1.1.3
sphinx-book-theme==1.1.4
# via rocm-docs-core
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.6.0
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==1.0.2
sphinx-notfound-page==1.1.0
# via rocm-docs-core
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==1.0.8
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-devhelp==1.0.6
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.0.5
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.7
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==1.1.10
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
tomli==2.0.1
sqlalchemy==2.0.39
# via jupyter-cache
stack-data==0.6.3
# via ipython
tabulate==0.9.0
# via jupyter-cache
tomli==2.2.1
# via sphinx
tornado==6.4.2
# via
# ipykernel
# jupyter-client
traitlets==5.14.3
# via
# comm
# ipykernel
# ipython
# jupyter-client
# jupyter-core
# matplotlib-inline
# nbclient
# nbformat
typing-extensions==4.12.2
# via
# beautifulsoup4
# ipython
# myst-nb
# pydata-sphinx-theme
# pygithub
urllib3==2.2.2
# referencing
# sqlalchemy
urllib3==2.3.0
# via
# pygithub
# requests
wrapt==1.16.0
wcwidth==0.2.13
# via prompt-toolkit
wrapt==1.17.2
# via deprecated
zipp==3.21.0
# via importlib-metadata

View File

@@ -1,4 +1,4 @@
.. meta::
.. meta::
:description: What is ROCm
:keywords: ROCm components, ROCm projects, introduction, ROCm, AMD, runtimes, compilers, tools, libraries, API
@@ -10,7 +10,7 @@ ROCm is an open-source stack, composed primarily of open-source software, design
graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development
tools, and APIs that enable GPU programming from low-level kernel to end-user applications.
.. image:: data/rocm-software-stack-6_2_0.jpg
.. image:: data/rocm-software-stack-6_1_0.jpg
:width: 800
:alt: AMD's ROCm software stack and neighboring technologies.
:align: center
@@ -44,10 +44,9 @@ Machine Learning & Computer Vision
":doc:`MIGraphX <amdmigraphx:index>`", "Graph inference engine that accelerates machine learning model inference"
":doc:`MIOpen <miopen:index>`", "An open source deep-learning library"
":doc:`MIVisionX <mivisionx:index>`", "Set of comprehensive computer vision and machine learning libraries, utilities, and applications"
":doc:`ROCm Performance Primitives (RPP) <rpp:index>`", "Comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends"
":doc:`rocAL <rocal:index>`", "An augmentation library designed to decode and process images and videos"
":doc:`rocDecode <rocdecode:index>`", "High-performance SDK for access to video decoding features on AMD GPUs"
":doc:`rocPyDecode <rocpydecode:index>`", "Provides access to rocDecode APIs in both Python and C/C++ languages"
":doc:`ROCm Performance Primitives (RPP) <rpp:index>`", "Comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends"
Communication
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -95,41 +94,22 @@ Primitives
Tools
-----------------------------------------------
System Management
^^^^^^^^^^^^^^^^^
.. csv-table::
:header: "Component", "Description"
":doc:`AMD SMI <amdsmi:index>`", "C library for Linux that provides a user space interface for applications to monitor and control AMD devices"
":doc:`ROCm Data Center Tool <rdc:index>`", "Simplifies administration and addresses key infrastructure challenges in AMD GPUs in cluster and data-center environments"
":doc:`HIPIFY <hipify:index>`", "Translates CUDA source code into portable HIP C++"
":doc:`ROCdbgapi <rocdbgapi:index>`", "ROCm debugger API library"
":doc:`ROCm compilers <./reference/rocmcc>`", "Clang/LLVM-based compiler"
":doc:`rocminfo <rocminfo:index>`", "Reports system information"
":doc:`ROCProfiler <rocprofiler:index>`", "Profiling tool for HIP applications"
":doc:`ROCTracer <roctracer:index>`", "Intercepts runtime API calls and traces asynchronous activity"
":doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`", "Captures the performance characteristics of buffer copying and kernel read/write operations"
":doc:`ROCm CMake <rocmcmakebuildtools:index>`", "Collection of CMake modules for common build and development tasks"
":doc:`ROCm Data Center Tool <rdc:index>`", "Simplifies administration and addresses key infrastructure challenges in AMD GPUs in cluster and data-center environments"
":doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`", "Source-level debugger for Linux, based on the GNU Debugger (GDB)"
":doc:`ROCm SMI <rocm_smi_lib:index>`", "C library for Linux that provides a user space interface for applications to monitor and control GPU applications"
":doc:`ROCm Validation Suite <rocmvalidationsuite:index>`", "Detects and troubleshoots common problems affecting AMD GPUs running in a high-performance computing environment"
Performance
^^^^^^^^^^^
.. csv-table::
:header: "Component", "Description"
":doc:`Omniperf <omniperf:index>`", "System performance profiling tool for machine learning and HPC workloads"
":doc:`Omnitrace <omnitrace:index>`", "Comprehensive profiling and tracing tool for HIP applications"
":doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`", "Captures the performance characteristics of buffer copying and kernel read/write operations"
":doc:`ROCProfiler <rocprofiler:index>`", "Profiling tool for HIP applications"
":doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`", "Toolkit for developing analysis tools for profiling and tracing GPU compute applications. This toolkit is in beta and subject to change"
":doc:`ROCTracer <roctracer:index>`", "Intercepts runtime API calls and traces asynchronous activity"
Development
^^^^^^^^^^^
.. csv-table::
:header: "Component", "Description"
":doc:`HIPIFY <hipify:index>`", "Translates CUDA source code into portable HIP C++"
":doc:`ROCm CMake <rocmcmakebuildtools:index>`", "Collection of CMake modules for common build and development tasks"
":doc:`ROCdbgapi <rocdbgapi:index>`", "ROCm debugger API library"
":doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`", "Source-level debugger for Linux, based on the GNU Debugger (GDB)"
":doc:`ROCr Debug Agent <rocr_debug_agent:index>`", "Prints the state of all AMD GPU wavefronts that caused a queue error by sending a SIGQUIT signal to the process while the program is running"
Compilers
@@ -138,9 +118,9 @@ Compilers
.. csv-table::
:header: "Component", "Description"
":doc:`HIPCC <hipcc:index>`", "Compiler driver utility that calls Clang or NVCC and passes the appropriate include and library options for the target compiler and HIP infrastructure"
":doc:`ROCm compilers <llvm-project:index>`", "ROCm LLVM compiler infrastructure"
"`FLANG <https://github.com/ROCm/flang/>`_", "An out-of-tree Fortran compiler targeting LLVM"
":doc:`hipCC <hipcc:index>`", "Compiler driver utility that calls Clang or NVCC and passes the appropriate include and library options for the target compiler and HIP infrastructure"
"`LLVM (amdclang) <https://github.com/ROCm/llvm-project>`_ ", "Toolkit for the construction of highly optimized compilers, optimizers, and runtime environments"
Runtimes
-----------------------------------------------

Submodule libs/AMDMIGraphX deleted from b1c8c8e8d8

Submodule libs/HIP deleted from 50ec90c65c

Submodule libs/HIPIFY deleted from c4696fc7f2

Submodule libs/MIOpen deleted from 36bb7fd4a2

Submodule libs/MIVisionX deleted from 5236504693

Submodule libs/ROCdbgapi deleted from 4566f62a0d

Submodule libs/ROCgdb deleted from 7ba4b4f96d

Submodule libs/Tensile deleted from dbc2062dce

Submodule libs/amdsmi deleted from 2b02a07970

Submodule libs/clr deleted from dd7f957662

Submodule libs/half deleted from 1ddada2251

Submodule libs/hipBLAS deleted from e734acbca0

Submodule libs/hipBLASLt deleted from 500158158f

Submodule libs/hipCUB deleted from 1875530e7d

Submodule libs/hipFFT deleted from baee8f8a70

Submodule libs/hipRAND deleted from 080186daed

Submodule libs/hipSOLVER deleted from 4e095b95b9

Submodule libs/hipSPARSE deleted from 3800cfe6e0

Submodule libs/hipSPARSELt deleted from 004dacd5bc

Submodule libs/hipTensor deleted from b767a2f935

Submodule libs/hipfort deleted from a971a24e08

Submodule libs/hipother deleted from c18a23c1de

Submodule libs/omniperf deleted from 17eb9e3a44

Submodule libs/omnitrace deleted from f0bd9126a5

Submodule libs/rccl deleted from 45b618a315

Submodule libs/rdc deleted from 8c56855a05

Submodule libs/rocAL deleted from 6c331ce047

Submodule libs/rocALUTION deleted from 3bf8de2f04

Submodule libs/rocBLAS deleted from 54f305c18f

Submodule libs/rocDecode deleted from 9d44600e95

Submodule libs/rocFFT deleted from 7a8c4759ac

Submodule libs/rocPRIM deleted from eab1eed11e

Some files were not shown because too many files have changed in this diff Show More