* Known issue updated
* Reworded for clarity
* Minor update
* Minor change
* Known issue updated
* Reference link added
* Apply suggestions from code review
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* PLDM updated
* SME feedback added
* Minor change
* ROCm Optiq added
---------
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* Use intersphinx links for deep learning
* Update deep-learning-rocm.rst
remove Taichi
* Update deep-learning-rocm.rst
Change Install link to "link"
* Apply suggestion from @randyh62
OK
* New GPUs listed
* GPU highlights updated
* OS table removed
* JAX 0.8.0 support added
* Apply suggestions from code review
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* Azure Linux 3.0 removed
* Review feedback added
* Release and changelog synced
* Minor corrections and date change
---------
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* OS table removed from compatibility table
* Feedback added
* Azure Linux 3.0 and compatibility version update
* Version fix
* Review feedback added
* Minor change
* Adding ROCm-Optiq note to What is ROCm page
Adding a note for a link to the Optiq docs
* Apply suggestion from @mattwill-amd
* Apply suggestion from @mattwill-amd
* Apply suggestion from @mattwill-amd
* Update what-is-rocm.rst
* Update what-is-rocm.rst
* Apply suggestion from @mattwill-amd
* Apply suggestion from @mattwill-amd
* Apply suggestion from @mattwill-amd
* Apply suggestion from @mattwill-amd
- Update rccl component pipeline to include new additions made to projects already in super repos.
- Also update rccl to trigger rocproifler-sdk job upon completion.
- rocprofiler-sdk pipeline updated to include os parameter to enable future almalinux 8 job.
* add previous versions
* Fix heading levels in pages using embedded templates (#5468)
* update primus-megatron doc
update megatron-lm doc
update templates
fix tab
update primus-megatron model configs
Update primus-pytorch model configs
fix css class
add posttrain to pytorch-training template
update data sheets
update
update
update
update docker tags
* Add known issue and update Primus/Turbo versions
* add primus ver to histories
* update primus ver to 0.1.1
* fix leftovers from merge conflict
* archive previous doc version
* update model/docker data and doc templates
* Update "Reproducing the Docker image"
* fix: truncated commit hash doesn't work for some reason
* bump rocm-docs-core to 1.26.0
* fix numbering
fix
* update docker tag
* update .wordlist.txt
* Update CHANGELOG.md
Removed duplicate num_threads entry, and added a new Resolved issue from Julia.
* Update RELEASE.md
Removed duplicate num_threads entry and added a resolved issue from Julia.
* Add origami yaml pipeline.
* Unindent lines.
* Add cmake dependency step to origami yml.
* Add pybind dep
* Fix pipeline failures.
* Quick fix
* Fix pybind11 dep for almalinux
* Fix pybind11 dep for almalinux again
* Test
* [Ex CI] don't create symlink if more than one sparse checkout dir
* hipBLASLt multi sparse
* Replace pybind with nanobind.
* Quick fix
* Testing nanobind install in pipelines
* Run origami binding tests
* Change build path for tests
* Change build path for tests again
* Add missing dep for CI
* Add archs to buildJobs
* Fix CI error.
* Test
* Test job target
* Adding job target to hipblaslt dependant builds
* Check devices on machine
* Add gpu to pipeline
* Add more gpu targets
* test
* Add test job to origami
* Update test jobs
* Finding test dir
* Fix sparse checkout
* Find build dir
* Try to find build dir
* Clean up
* Test
* Change test dir
* Build origami in test job
* Try removing job.target from params
* Package bindings in build artifacts
* Download build as artifact.
* Comment out block
* Fix checkout in test job
* Test1
* Echo to list dir
* Sparse checkout origami/python
* Download python bindings as artifact
* Try ctest instead of running test files directly
* Only download artifacts for ubuntu
* Add missing cd
* Run individual tests not ctest.
* Fix hipblaslt build failures
* Resolve more ci failures in hipblaslt
* Add old changes back in
* Fix hipblaslt ci errors
* Clean up
* Add nanobind to array
* Add nanobind to array correctly
* Remove nanobind install script
* Quick fix
* Add pip module installs to test job
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
- Trigger downstream build of rocpydecode within rocdecode pipelines.
- Copying similar variables as other pipelines even though these projects are not in the super-repos.
* Indentation and formatting updated
* Known issues added
* Known issues udpated
* Minor change
* Known issues updated
* KMD UMD udpate
* Updated known issues
* Additional text removed from known issues
* Oracle linux 10 removed
* Indentation and formatting updated
* Resolved issue for kokkos option added
* Known issue for ROCr added
* 2nd known issue added
* Known issues updated
* adding 2 known issues
* Apply suggestions from code review
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Update RELEASE.md
* Known issues added
* Approved known issue added
* Component removed based on Leo's feedback
* Issue link added
---------
Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com>
Co-authored-by: Matt Williams <matt.williams@amd.com>
- Subset of the hipblaslt component yaml, deleting extra gpu targets and the testing component.
- Sparse checkout details removed.
- Basic build flags from top-level invocation added.
* init
* fix source dir
* miopen specify test build dir
* fix test build dir
* revert change
* fix test build again
* move to ultra temporarily
* miopen-get-ck, working dir
* exclude flaky test
* move back to high
* Add MIVisionX and AMDMIGraphX downstream jobs to MIOpen
* comment sparsecheckoutdir
* quote component names
* fix artifact name
* miopen ck script exit on fail
* add downstream checkout repos
* mivisionx, add aomp
* Update ROCR-Runtime.yml
Migrate from rocmsmi to amdsmi
* Update ROCR-Runtime.yml
Removed libhwloc.so.5 install
* Update ROCR-Runtime.yml
Link to hwloc.so.5
* Update ROCR-Runtime.yml
Added link in the rocrtst step
* Update ROCR-Runtime.yml
* Sphinx warning for DGL fixed
* Update dgl-compatibility.rst
removed benchmark line and updated link
---------
Co-authored-by: Pratik Basyal <prbasyal@amd.com>
* HIP 7.0 upcoming changes blog link updated
* Documentation highlight for deep learning framework added
* Note loading fixed
* Note removed
* Link fixed
* verl compatibility
* add Supported features
Signed-off-by: Vicky Tsang <vtsang@amd.com>
* updated and edited verl compat doc
* added links to verl
* add future release for sglang and megatron inference eng.
Signed-off-by: Vicky Tsang <vtsang@amd.com>
* fix lint
Signed-off-by: Vicky Tsang <vtsang@amd.com>
* fixed a typo and a table
* Spolifroni amd/add to compat matrix (#430)
* added verl to compatibility matrix
* small change
* fixed an error in csv
* edited the verl compat based on leo's recommendations
* updated compat matrix (#435)
* Added a hardcoded link to the verl install
This is a link to an RTD build and MUST be removed before publishing.
* Update verl-compatibility.rst
* Added a hardcoded link to the verl install
This link is to an RTD build and it WILL break at publishing. It MUST be changed before publishing.
* Added version support note (#448)
* small fixes
* Update verl-compatibility.rst
* Update verl-compatibility.rst
---------
Signed-off-by: Vicky Tsang <vtsang@amd.com>
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: anisha-amd <anisha.sankar@amd.com>
* add wan2.1 to pyt inference models
* update group name
* fix container tag
* fix group name
* change documented data type to bfloat16
* fix col width
Added AlmaLinux 8 Pipeline Support
- aomp
- HIPIFY
- rocDecode
- ROCgdb
- rocJPEG
- rocprofiler
- aqlprofile dependency template
- build autotools template
- download latest cmake template
Pipeline Changes
- More gfx build targets.
- Copying llvm-lit to the llvm-project published artifacts.
- HIPIFY now uses our built version of llvm-project for its pipeline.
- Disable testing in HIPIFY pipeline due to low value provided. Revisit in the future.
- aomp's ROCm dependency list reduced.
- aomp's openmp build had issues with ninja on AlmaLinux 8.
- Add hipSPARSELt dependency.
- Add hipBLASLt test dependency for rocroller shared library.
- Update pip dependency versions.
- Install another typing_extensions at a specific folder for one of the builds we do not control to work.
- Wheel renaming no longer works, so we need to find another mechanism if we start doing builds for different branches and gfx architectures.
- Fixed rocprim pipeline to not rebuild during install step.
- Updates to hipblas-common, hipcub, hiprand, and rocthrust pipelines to build on AlmaLinux8 and more gfx architectures.
- Include rocm-cmake dependency when CMake setup mentions it.
* [External CI] Ubuntu 24.04 job for llvm-project
* temporarily switch to using 'high' build pool while 'ultra' is down
* switch almalinux8 to build on manylinux container
* add pool for alma8 container
* switch alma8 packag manager to apt
* Update llvm-project.yml
* switch back to dnf after resolved container init
---------
Co-authored-by: Joseph Macaranas <Joseph.Macaranas@amd.com>
- Increase compilation coverage for rocrand to more gfx architectures.
- Follow similar path as recent rocprim pipeline changes.
- Add and fix conditionals in cmake template to consolidate the cmake build and install steps to deal with the re-build being done. This is not required in the ubuntu 22.04 job.
- The build time is a little bit too long on the free agents and we will end up capped on free runners soon, so changing the build pool.
GCC Toolset 14 Environment
- source /opt/rh/gcc-toolset-14/enable only lasts for the shell session, so run at the beginning of relevant build and test tasks when the OS is AlmaLinux 8.
- CMake tasks set env to behave as if source /opt/rh/gcc-toolset-14/enable command was run.
- Observed that the built ROCm libraries can either be installed on lib or lib64 directories in this OS profile, so ldconfig step is adjusted to look at additional directories. This won't impact usage in ubuntu22 if the lib64 directories don't exist in the custom ROCm build.
- For the llvm linking step we cannot assume the ROCm lib directory exists, as only ROCm lib64 might be present on the build environment.
- libatomic package was added to the gcc toolset setup.
yaml-based Changes
- base set of dnf packages now defined in an array for dependencies that already come pre-installed on the ubuntu22 VMs.
- Changed format of the job matrix for readability.
New Features
- AlmaLinux 8 pipelines for roctracer and ROCdbgapi.
- roctracer pipeline expanded to support compilation for gfx1030 and gfx1100.
- AlmaLinux 8 llvm-project pipeline now builds flang and flang-rt, so re-enabled for ubuntu 22.04 pipeline as well.
TODO
- Revisit why ninja-build is not used for comgr, device-libs, and hipcc.
- Removed building flang in this pipeline. Will build flang in the aomp pipeline to unblock progress on runtimes and first set of math libraries. Flang debug can also be moved to a cheaper VM.
- ninja-build from dnf is too old for llvm-project. Using a release from GitHub instead.
- Added more dnf package mappings.
- scl enable command is not needed.
- Modified job matrices and templates to support a second OS.
- Included creation of Virtual Machine Scale Sets running AlmaLinux OS 8.10 with GCC toolset 14 to match manylinux 2_28.
- Dependency download algorithm modified so that only a single array of package manager (apt) packages need to be provided as input and then the other package managers have a mapping of equivalent packages.
- Cleaned up python3-pip in the arrays as those should already be on the VMs.
- This will be an iterative process of getting components to build on this OS profile, and starting with the components that don't have interdependencies.
- Highest priority is to get the rocm-libraries working.
* Remove sparseCheckout param
* Add support for downloading same-pipeline-builds for monorepo chain builds
* Make local-artifact step names more informative
* Use componentName param for artifact filenames
* Enable chain downstream triggers for PRIMs & RANDs
* Set preTargetFilter for tests' local-artifact-download call
* Set checkout: none for test jobs
* Exclude failing rocThrust scan.hip test
* Matrixize downstream jobs
* fix vllm link in release.md
* add RDNA4 note in compat matrix
* update hipcc github url to specific path in llvm-project repo
* remove non-existant HIP upcoming changes reference
* remove non-existant resolved issues internal link
* fix hip upcoming changes url
* duplicate amd smi known issue
* Remove JAIS 13B and 30B
* update Docker details - vLLM 0.8.3
* add previous version
* Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
* fix link to previous version
* Known issue for installation failure added
* Github issue No. added
* Typo fixed
* Feedback from Anush updated
* Minor change
* Feedback from Fai added
* Public Issue No. updated
* Minor change
* add files
* Allow command line args for download script
* Move script into separate folder
* Add newline to end of script
---------
Co-authored-by: David Dixon <david.dixon@amd.com>
- Add knobs to toggle aggregate build options.
- Aggregate build pipeline will pull ROCm dependencies from earlier in the same pipeline.
- Changing build pool of some components for more compute power.
- Deleting deprecated component.
- Add Ninja to dependency compilation in MIOpen.
- Add retries to wget for MIOpen CK build case.
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
- Pipeline now uses separate CMake calls to build extras, openmp, and offload.
- Legacy and other components no longer included. Revisit building them without including them in the build artifacts.
Tiny fix that removes the "export" directive.
` export HIP_FORCE_DEV_KERNARG=1 hipblaslt-bench ...`
leads to
bash: export: `hipblaslt-bench': not a valid identifier
whereas just starting with HIP_FORCE_DEV_KERNARG=1 passes this env var to the hipblaslt-bench process, which I think is the intention here.
* Update RELEASE.md
added two new Resolved Issues and made two other changes
* Update RELEASE.md
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
---------
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* ROCProfiler deprecation notice udpated
* RHEL 9.6 support removed and 9.5 EOS rejected
* Feedback to KV cache highlight added
* Wrong entry of ROCprofiler-SDK removed
* --kokkos-trace issue drafted
* Known issues for compute parition and JAX limited support added
* Known issues for ROCm Systems profiler and MIOpen added
* Feedback from Leo added
* AMD Radeon PRO W7800 48GB support added to RN
* rocSPARSE fixed issue added
* AMD RDNA 2 removed from TOC
* Revert "AMD RDNA 2 removed from TOC"
This reverts commit a8511fb7826891f27d42f1d749fd5356dbaacfbe.
* Unvalidated known issues removed
* Leo's feedback incorporated
* Changelog.md sync with release.md
* fix vllm engine args link
* remove RDNA subtree in under system optimization in toc
* fix RDNA 2 architecture PDF link
* fix CLR LICENSE.txt link
* fix rocPyDecode license link
* ROCProfiler deprecation notice udpated
* RHEL 9.6 support removed and 9.5 EOS rejected
* KV cache highlight updated
* Feedback from Peter Incorporated
Co-authored-by: Peter Park <peter.park@amd.com>
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* ROCProfiler deprecation notice udpated
* RHEL 9.6 support removed and 9.5 EOS rejected
* OS support updated
* Documentation highlight updated
* Update on hardware atomics update
* rocPyDecode version updated
* Quick update in Changes to changes
* Command translation fixed
* gfx950 removed from CK changelog
* glibc version updated
* gfx950 removed
* Changelog list updated
* System optimization migration changes in ROCm
* Linting issue fixed
* Linking corrected
* Minor change
* Link updated to Instinct.docs.amd.com
* ROCm docs grid updated by removing IOMMU.rst, pcie-atomics, and oversubscription pages
* Files removed and reference fixed
* Reference text updated
* ROCProfiler deprecation notice udpated
* RHEL 9.6 support removed and 9.5 EOS rejected
* Updated KMD/UMD content
* Minor correction
* Quick feedback from Ram incorporated
* KMD/UMD seperation highlight updated
* Feedback from leo, Ram, and David updated
* Minor change
* Minor change
* Suggestion from Leo added
* Feedback from Ram incorporated
* Minor fix
* Minor change
* Quick change from Ram
* ROCProfiler deprecation notice udpated
* Link error
* Compatibility updated
* New changelog and OS support updated
* Upcoming changes removed from rocWWMA, added to hipTensor
* Glibc added to wordlist
* Instict docs content added
* RHEL 9.5 to OS
* Compatibility OS update
* Leo's feedback incorporated and TOC updated for linux requirement
* ROCProfiler deprecation notice udpated
* Updated forward backward compatibility content
* Minor fixes on KMD uder space support note
* SLES 15.7 removed
* SLES version formatting update
* Known issue for generic target added
* Known issue update
* Oracle version major release only
* Only major version for oracle linux
* AMDGPU driver known issue updated
* Leo's feedback incorporated
* Leo's feedback incorporated
* Historical change added
* QUick fix
* Fixed issues added
* Jeff's feedback on rocWWMA and hiptensor changelog added
* 6.4.0 changelog added
* DLPack and VP9 added
* update RELEASE based on internal discussion
* remove link to cl
---------
Co-authored-by: Peter Park <peter.park@amd.com>
- Add flang to built projects.
- Upgrade build VM to account for additional project.
- Temporarily ignore a test case for debug info, which is not a high priority in External CI.
* Corrected typo
Corrected typo in line 119 prerequisities -> prerequisites
* Corrected typo in README.md
Corrected typo in line 119 prerequisities -> prerequisites
* Update Megatron-LM and PyTorch Training Docker docs
Also restructure TOC
* Apply suggestions from code review
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
update "start training" text
Apply suggestions from code review
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
update conf.py
fix spacing
fix branding issue
add disable numa
reorg
remove extra text
- Removing the creation of expected folders and symbolic links as workaround to get the test components compiling.
- Set the only OpenCL build flag affecting the build.
- Also, fixes to rocprofiler-sdk when incorporating recent features.
- URL encoding algorithm converts trailing '=' in the base64 string to an integer representing the number of those trailing '=' characters.
* Initial draft for How-to POC
* Zone.identifier file removed
* Broken links in index.md fixed
* Zone.identifier file removed
* Review feedback incorporated
* Title updated
* New format for ROCm for AI TOC created
* Folder structure changed
* ROCm for AI index updated
* Link to Llama recipe updated
* Review feedback added
* Feedback from Cindy added
* Intro text from Cindy added
* New flow suggested by Hongxia incorporated
* Overview content from Cindy added, TOC updated, Meta data updated
* Reference to HPC removed
* Listing alignment updated
* Overview page updated
* Folder structure and link change resulted from TOC change updated
* Content sequence updated
* Meta data updated
* Review feedback incorporated
* Index file renamed
* Conf file updated for OS compatibility info
* update metadata (#4)
update metadata
fix spelling
* Wordlist updated
---------
Co-authored-by: Peter Park <peter.park@amd.com>
- pip update click module for test failures.
- Test results are at 99.8% with these fixes.
- Missing cmake dependency from last PR for ROCR-Runtime
- Missing pkg-config dependency for amdsmi
- Modify PATH to find pip's cmake for rocprofiler-sdk
- Dynamically write a Dockerfile based on the environment for the failing job.
- Account for additional dependencies that need to be installed and setup.
- Build and push a custom container based on that dynamic Dockerfile to capture that failing environment.
- Documenting additional setup to install Docker on VMSS during provisioning.
* Change AMDMIGraphX to use local-artifact-download for half 5.6
* Refactor dependencies-rocm & artifact-download, consolidate component variable lists
* Add mainline option to nightly
* Change all components to new dependencies-rocm usage
* rm aqlprofile checkoutRef
* simplify dependencies-rocm, add gpuTarget back to rocMLIR
* rm tag-builds from aqlprofile
* Make review changes
These were encountered while debugging
https://github.com/ROCm/ROCm/issues/4190
- There is no manifest (-m) for ROCm 6.3.1 in the tools/rocm-build folder
-- Changed the rocm version to 6.3.0 to avoid immediate build failure
- The manifest is not specified in the first instance of "Downloading the ROCm source code", but it is in "Build ROCm from source".
-- Without the correct manifest, subsequent build instructions will fail as the ROCm/ROCm directory doesn't get pulled. It's unclear why these two otherwise identical commands are duplicated and have this discrepancy
* remove 'Using MPI' and 'gpu-cluster-networking' sections due to migration to dcgpu
* remove gpu-cluster-networking from index page
---------
Co-authored-by: Alex Xu <alex.xu@amd.com>
- Recent vision compilation has been failing, and debugging hasn't been fruitful in finding cause.
- Should unblock nightly job to at least build and test pytorch while debug effort continues after the holidays.
- pytorch build and test is unblocked by temporarily patching the composable_kernel submodule on upstream pytorch to latest develop, until that submodule is updated to have explicit cast for hneg.
* Updated for 6.3.1
* Compatible version updated from RC1 build
Co-authored-by: Peter Park <peter.park@amd.com>
* Comptibility table and rst updated
* Compatible version updated from RC1 build
Co-authored-by: Peter Park <peter.park@amd.com>
* Peter's review feedback incorporated
Co-authored-by: Peter Park <peter.park@amd.com>
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
* 6.3.1 Release notes (#224)
* New Release highlight on offline installer added. OS change, Known Issues, Resolved issues, and upcoming changes copied from 6.3.0 and updated version
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
* Updates to release notes (#229)
* Updates to release notes
* & -> and
* Updated the component changes, table, release highlights, and fixed i… (#232)
* Updated the component changes, table, release highlights, and fixed issues
* Version number and heading title fixed
* Update RELEASE.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* Update RELEASE.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* Version transition updated
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* Rn 631 custombranch (#234)
* Updated the component changes, table, release highlights, and fixed issues
* Version number and heading title fixed
* Update RELEASE.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* Update RELEASE.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* Version transition updated
* OS and Hardware compatibility updated
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
* [6.3.1 release notes] Add rocprof-sys changes to RN (#235)
* remove extra sections
* add rocprof-sys changelog
* add omni fixed issues and ami smi cl
* Update RELEASE.md
Version transition added in the table
* add documentation update note
* Broken link fixed
---------
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* added edited version of the migraphx changelog and removed CK entry (#238)
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Updating ROCm-internal with 6.3.1 release notes changes (#241)
* Updated date and version
* Typos and wording fixed
* Minor fix
* Documentation update added
* MIGraphX change dropped
* Debian support removed
* New release highlight added
* HIPRand version changed
* Cross-reference to Per queue added
* Leo's review feedback incorporated
* HIP optimized section updated
* Istinct and Peter's feedback added
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
* Fix changelog and new documentation note (#246)
* fix amd smi and add training a model using megatron note
* update workload tuning doc note
* fmt
---------
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
- Exclude lone, consistently failing MIOpen test.
- test_rnn_seq_api is the only ctest failure, so let's filter it out for now to easily identify new failures.
minor fixes to formatting
fix spelling errors
more spelling
fixes
quantization update
fix format
simplify wording in tunableops and format fix
Apply suggestions from code review
review feedback by Peter
Co-authored-by: Peter Park <peter.park@amd.com>
Apply suggestions from code review
addressing feedback
Co-authored-by: Peter Park <peter.park@amd.com>
Apply suggestions from code review
feedback again
Co-authored-by: Peter Park <peter.park@amd.com>
add hipblaslt yaml file figure
feedback and minor formatting
formatting
update wordlist.txt
remove outdated sentence regarding fsdp and rccl
(cherry picked from commit 87fa9fd83a2e623f6cab4e69d65f49e3db0a45f6)
update wordlist
Co-authored-by: hongxyan <hongxyan@amd.com>
- aomp: Account for path changes due to LLVM_INSTALL_LOC from aomp PR #1012
- aomp: Add llvm-legacy build script step for aomp PR #1062
- rocWMMA: Fix rpath issue when using ninja.
* removed the building doc; edited toolchain to remove myst; made the fact that rst is the preferred format evident
* edited the readme so that it points to the contributing to the rocm docs page
* Update docs/contribute/contributing.md
Co-authored-by: Peter Park <peter.park@amd.com>
* Update docs/contribute/contributing.md
Co-authored-by: Peter Park <peter.park@amd.com>
* added two images showing where the checks and doc build is
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* Update version list with 6.2.0 (#3505) (#3506)
* Fix link to meta-llama finetuning recipes
* Spellcheck fixes in release notes templates (#3526) (#3548)
* fix spelling in 5.4.x templates
* add to wordlist
* update templates
update wordlist
* remove extra_components
rm extra_components
* fix spelling
Co-authored-by: Peter Park <peter.park@amd.com>
* Fix link to rocr debug agent (#3533)
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Fix intersphinx links (#3546)
* update fw install links
* fix more intersphinx links
* fix more links
* add rocPyDecode repo to ROCm6.2 manifest file (#3541) (#3553)
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
* Fix typo for TFLOPs metric in MI250 architecture page
* Add rocm-examples to default.xml (#3583)
* Add rocm 6.2.0 manifest file for rocm-build scripts (#3538)
* Add rocm 6.2.0 manifest file for rocm-build scripts
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Add "rocm-examples"
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Add a section on increasing memory allocation to the MI300A system op… (#3587)
* Add a section on increasing memory allocation to the MI300A system optimization guide
* Addition to wordlist
* Change GB to GiB for consistency
* Standardize GiB/KiB spacing
* Minor wording changes
* Update build scripts for ROCm6.2 release
* fix README.md for Ubuntu24 docker
* Correct ttm to amdttm (#3648)
* Expand the section on changing thread affinity (#3653)
* Expand the section on changing thread affinity
* Clarify the methods for configuring allocatable memory settings
* Small correction
* Update model-quantization.rst to import `BitsAndBytesConfig` from transformers library (#3638)
* remove unneeded file (#3663)
* Fix intersphinx links (#3668)
* fix links in install.rst
* fix links in sys opt guides
* Add introduction and links to the new guide to the vLLM optimized Doc… (#3637)
* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub
* Update target link for the Docker vLLM guide
* Change target URL
* Change link target URL again
* Fixed broken link to RISC-V documentation
* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page (#3659)
* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page
* Add words to wordlist and fix a typo
* Add new sections for Docker and testing
* Incorporate comments from the external review
* Some minor edits and clarifications
* Incorporate further review coments and fix test section
* Add comment to test section
* Change git clone command for FBGEMM repo
* Change Docker command
* Changes from internal review
* Fix linting issue
* Fixed broken links for tensile, rocprofiler, roctracer, hipify, rocm-cmake
* add missing make command to bitsandbytes install commands (#3722)
* Update link to rocRAND data type support (#3736)
* Fix Radeon link and point at R6.1.3 as absolute link (#3757)
* Fix Radeon link and point at R6.1.3 as absolute link (#3757)
* Include rocal version change in the highlights (#177)
* Include rocal version change in the highlights
* Reworded rocal known issues and added link to rocal in highlights
* Update ROCm manifest to 6.2.1
* Update ROCm branch name
* Add 6.2.1 to version list (#3770)
* Add links to GH issues in 6.2.1 release notes (#3769)
* add MAD page
* link to GitHub issues in release notes known issues
* update templates for 6.2.1
* Revert "add MAD page"
This reverts commit 9cce72bba3.
* update wordlist for spellcheck linter
* add rccl note
* update rocal version change heading to be more obvious
* make rocal note more specific
* fix missing space
* fix capitalization
* Update RCCL known issue wording (#3775)
* add MAD page
* fix wording in RCCL known issue
* Revert "add MAD page"
This reverts commit c81d0f3b0a.
* update llvm version for 6.2.1 (#3779)
* Fix broken links in 6.2.1 release notes (#3782)
* External CI: Replace libomp dependencies with aomp (#3781)
Add roctracer dependency for hipBLAS and rocWMMA testing
* External CI: Add rocprofiler v1 and v2 smoke tests (#3784)
* External CI: ROCgdb smoke tests (#3785)
- Since this is an autotools project and not cmake, build and test on gfx942 system instead of separating into two jobs. Pipeline time is short anyway.
- Follow build instructions to update build flags and to incorporate the ROCdbgapi.
- Results are not parsed and graphed, but the log contents are printed at the end. This was helpful for debugging and will be kept in the pipeline, as the make check-gdb command's output was not helpful on its own.
* External CI: rocPyDecode Smoke Test (#3786)
* External CI: omniperf pipeline (#3788)
- Referred to public documentation, source, and iterative attempts to create and improve build and test pipeline.
- ctest failures are due to the test node not having expected marketing name string and override not working.
- The fix should be on the omniperf repo side of things, so this pull request should be fine as is.
* External CI: create omniperf pipeline IDs, update nightly build (#3790)
* Fixed greater than to be less than in rocFFT changes
* fix footnote for 6.1.0 (#3791)
* fix footnote for 6.1.0
* fix empty columns in historical KFD title
* External CI: Publish wheel as artifact for rocPyDecode (#3796)
* fix build rocal for ROCm6.2.1
* Add ROCm6.2.1 manifest file
* External CI: fix hip-tests symlink creation (#3799)
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* External CI: fixes for rocMLIR and nightly build (#3800)
* External CI: fix symlinks for rocMLIR and nightly build
* add pipeline IDs for hip-tests
* fix hip-test ID typo
* remove llvm-alt license (#3727)
* remove llvm-alt license
* fix linting error
* External CI: enable ROCR-Runtime tests (#3809)
* External CI: default branches for hip-tests, omniperf (#3811)
* External CI: torch and torchvision smoke tests (#3810)
* External CI: torch and torchvision smoke tests
- Fixed issues with package name and version for the vision wheel that prevented it from installing. A patch is used until my pull request in vision repo is merged.
- Referred to rocAutomation scripts to pick which test scripts to run out of the many in the torch and vision repo, and iteratively tested suggested scripts to see which ones completed in a timely manner.
- Leveraging pytest-azurepipelines module to automatically parse and graph results from these tests.
* External CI: omnitrace build pipeline (#3812)
* External CI: omnitrace build pipeline starter
- Adding initial set of dependencies and build flags.
* External CI: omnitrace build pipeline
- Add bison, rccl, texinfo dependencies based on build failures.
- Add AMDGPU_TARGETS flag
- Add ROCm binaries to PATH for clang-format and other tools used.
* Fix indentation
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: AMDMIGraphX Build Fix (#3814)
- Swap to default gcc on OS to resolve build errors from recent commits.
- Added libdnnl-dev dependency from iterative attempts with compiler change.
- Referred to the passing GitHub checks to observe the compilers that was used.
- Build CK jit lib and include in AMDMIGraphX build.
* External CI: test fixes w/ roctracer, list omniperf as partially succeeding (#3815)
* External CI: rpp tests (#3816)
* External CI: Build pipeline for rocprofiler-sdk (#3819)
* External CI: Pipeline for rocprofiler-sdk
* Add rocprofiler dependency
* External CI: rocprofiler-sdk build pipeline
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: Fix/add missing pipeline IDs (#3818)
* Update default.xml - Change 6.2.1 to 6.2.2
* Add ROCm6.2.1 manifest file
* External CI: omnitrace tests (#3822)
* Update tags to 6.2.2 (#3827)
* Update tags to 6.2.2 (#3827)
* External CI: add roctracer to roc/hipSOLVER test deps (#3825)
* External CI: add rocprofiler-sdk pipeline IDs (#3824)
* External CI: AMDMIGraphX Smoke Tests (#3830)
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: MIOpen tests (#3837)
* Point to release history instead of deprecated changelog (#3836)
* External CI: filter out hipTensor extended tests (#3838)
* added revised note re. radeon gpus (#3839)
* Restructured the contributions section. (#3715)
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides (#3763)
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* Point to release history instead of deprecated changelog (#3836)
* Restructured the contributions section. (#3715)
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides (#3763)
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* updated the radeon note (#3850)
* External CI: Fix rocPyDecode wheel creation (#3852)
- Set values for expected environment variables.
- Accompanying changes required in rocPyDecode repo. Pull request will be made.
* External CI: pytorch vision patch removal (#3855)
My pull request applying this patch was merged upstream, so this is no longer needed and will break the pipeline since it can no longer be applied.
* Build(deps): Bump rocm-docs-core from 1.8.1 to 1.8.2 in /docs/sphinx (#3807)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.1...v1.8.2)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated the radeon note, as it were (#3857)
* updated the radeon note, as it were
* updated the note again
* Set devops team as codeowners for rocm-build (#3860)
* Set ext CI as codeowners for rocm-build
* Update CODEOWNERS to rocm-devops
* External CI: Add option to pull mainline branch for dependencies (#3689)
* External CI: Add option to pull mainline branch for dependencies
* Missing parameter for mainline branch dependencies.
* External CI: mainline branch definitions
* Removed MIGraphX optimization page (#3848)
* External CI: add a global variable to control gfx942 tests (#3864)
* External CI: update component default/mainline branches (#3871)
* External CI: Stop building gfx90a (#3872)
Save on VM resources until infrastructure has test targets.
* External CI: add libstdc++-12 to rocMLIR (#3874)
* Add building doc section (#3873)
* External CI: programmatically get latest aqlprofile (#3876)
* External CI: use ctest for rocm-examples (#3877)
* External CI: Tensile pipeline (#3884)
* add oversubscription conceptual doc (#3885)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
* add oversubscription conceptual doc (#3885)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
(cherry picked from commit d0ecf51b0c)
* add oversubscription conceptual doc (#3885)
(cherry picked from commit d0ecf51b0c)
* Add building doc section (#3873)
(cherry picked from commit abc0e6a087)
* External CI: Add pipeline to build upstream boost (#3896)
* Update bitsandbytes branch in docs (#3898)
* Update bitsandbytes branch in docs (#3898)
(cherry picked from commit b541be7bcb)
* Documentation: Add reference to precision-support floating-point types (#3899)
* External CI: use Boost template for MIOpen (#3903)
* External CI: create rocprofiler-systems pipeline (#3906)
* External CI: omnitrace/rocprof-sys pipeline IDs (#3908)
* External CI: MIOpen parse test results (#3913)
* External CI: Use pip to install latest cmake on test system (#3915)
* added a link to the compatibility matrix (#3904)
* added a link to the compatibility matrix
* removed quotes
* docs: Remove invalid amd_iommu=on parameter
Per kernel-parameters.txt, there is no "on" option for amd_iommu. While
intel_iommu has it, amd_iommu is automatically on unless specified
otherwise. For more info, see these 2 links:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt75aa74d52f/drivers/iommu/amd/init.c (L3481)
Signed-off-by: Kent Russell <kent.russell@amd.com>
* docs: Remove invalid amd_iommu=on parameter
Per kernel-parameters.txt, there is no "on" option for amd_iommu. While
intel_iommu has it, amd_iommu is automatically on unless specified
otherwise. For more info, see these 2 links:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt75aa74d52f/drivers/iommu/amd/init.c (L3481)
Signed-off-by: Kent Russell <kent.russell@amd.com>
(cherry picked from commit 74333b667d)
* External CI: hipBLASLt build now requires python packaging module (#3926)
https://github.com/ROCm/hipBLASLt/pull/1250/files#diff-fee2e6f068b33fca3a1dc49392de8848dbf05c3f4632b680abb1052523e5a30fR35
* External CI: Moved location of upstream pytorch build scripts (#3930)
https://github.com/pytorch/pytorch/pull/138103
* External CI: disable rocMLIR tests (#3931)
* External CI: disable rocMLIR tests
* roctracer AMDGPU_TARGETS flag
* External CI: create a GPU diagnostics template (#3932)
* External CI: Add CK into pytorch build environment (#3934)
* Update rocm-6.2.2.xml (#3927)
vim typo removed
* External CI: add support to disable individual component tests (#3938)
* External CI: AMDMIGraphX greater-equal pip dependencies (#3939)
* Build(deps): Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#3933)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* External CI: rocDecode add libva-amdgpu-dev dependency (#3940)
* External CI: enumerate GPUs in gpu-diagnostics (#3942)
* External CI: move gpu-diag directly before tests (#3943)
* External CI: fix HIP_PIPELINE_ID (#3944)
* External CI: pytorch pipeline updates (#3948)
To support recent upstream changes and issues observed.
* External CI: rocpydecode dependency installation change (#3954)
- Install pybind11 through pip instead of apt
- Add pip-installed pybind11 path to CMAKE_PREFIX_PATH
- Tested against source of PR 122
* External CI: do not assume python is python3 for rocpydecode (#3955)
* Improve consistency of the gpu-arch-specs table. (#3936)
* Improve consistency of the gpu-arch-specs table.
* Add XCD to the glossary.
* External CI: Always force rocPyDecode cleanup step
* External CI: Add aqlprofile to Tensile test dependencies (#3961)
* add vllm performance validation doc (#3964)
* External CI: various fixes (#3963)
* add suggestions to vllm perf validation doc (#3968)
* External CI: move allowPartiallySucceededBuilds to library variable (#3970)
* External CI: suppress GPU diag warnings (#3972)
* External CI: rocprofiler-compute pipeline files (#3973)
* External CI: disable reload AMDGPU (#3974)
* Update links to vllm perf validation doc (#3971)
* update links to vllm perf validation doc
* add PagedAttention to wordlist
* External CI: Change test setup for rocPyDecode (#3978)
- Use multiple potential locations for pybind11 to be found by cmake.
* External CI: add roctracer to rocBLAS deps (#3982)
* External CI: decode test changes (#3983)
- Only target container with access to first device
- Ensure pybind11-dev is uninstalled before the package manager install steps
* Changed the introductory text linked to Radeon (#3988)
Co-authored-by: prbasyal <prbasyal@amd.com>
* External CI: finish rocprofiler-compute enablement (#3995)
* External CI: add aomp as rocprofiler-systems dependency (#3996)
* External CI: remove omniperf from nightly (#4000)
* Sync from internal develop 6.2.4 (#4002)
* add radeon pro v710 to gpu arch specs (#192)
* Add V710 specs
gpg: using RSA key
22223038B47B3ED4B3355AB11B54779B4780494E
gpg: Good signature from "Peter Park (MKMPETEPARK01)
<peter.park@amd.com>" [ultimate]
add some specs
add cols
clean up extra line
* fix graphics l1 cache description
* update SGPR for RDNA2 and RDNA3 archs
* update VGPR
* Apply suggestions from code review
* change l2 cache to 4
* Update docs/reference/gpu-arch-specs.rst
* ROCm 6.2.4 compatibility matrix (#186)
* prep compat column (historical) and mi300x column
* update historical compat matrix for 6.2.4
* update compat matrix for 6.2.4
* fix compat
* fix thunk version
* fix hipify ver
* ROCm 6.2.4 release notes (#184)
* prep 6.2.4 release notes
* add mathlibs
* add detail component changes
* rm non-updated linnks
* fix sentence
* fix rocthrust v
* rm offline installer
* condense
* add leo/ram fdback
words
* update documentation section
* add rocm on radeon note
* update os support note wording
* update release
* update version and GA date to 10-17
* update 6.2.4 rn
* update wording
* add link to v710
* update wording
* update templ
* simplify note
* words
os note
words
* change URLs to latest
* update link to supported GPUs
* Update versions.md 6.2.4 date to Oct 18
* Update conf.py release note date to Oct 18
---------
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Sync change from ROCm to ROCm-internal (#194)
* Fix Radeon link and point at R6.1.3 as absolute link (#3757)
* Update ROCm manifest to 6.2.1
* Update ROCm branch name
* Add 6.2.1 to version list (#3770)
* Add links to GH issues in 6.2.1 release notes (#3769)
* add MAD page
* link to GitHub issues in release notes known issues
* update templates for 6.2.1
* Revert "add MAD page"
This reverts commit 9cce72bba3.
* update wordlist for spellcheck linter
* add rccl note
* update rocal version change heading to be more obvious
* make rocal note more specific
* fix missing space
* fix capitalization
* Update RCCL known issue wording (#3775)
* add MAD page
* fix wording in RCCL known issue
* Revert "add MAD page"
This reverts commit c81d0f3b0a.
* update llvm version for 6.2.1 (#3779)
* Fix broken links in 6.2.1 release notes (#3782)
* External CI: Replace libomp dependencies with aomp (#3781)
Add roctracer dependency for hipBLAS and rocWMMA testing
* External CI: Add rocprofiler v1 and v2 smoke tests (#3784)
* External CI: ROCgdb smoke tests (#3785)
- Since this is an autotools project and not cmake, build and test on gfx942 system instead of separating into two jobs. Pipeline time is short anyway.
- Follow build instructions to update build flags and to incorporate the ROCdbgapi.
- Results are not parsed and graphed, but the log contents are printed at the end. This was helpful for debugging and will be kept in the pipeline, as the make check-gdb command's output was not helpful on its own.
* External CI: rocPyDecode Smoke Test (#3786)
* External CI: omniperf pipeline (#3788)
- Referred to public documentation, source, and iterative attempts to create and improve build and test pipeline.
- ctest failures are due to the test node not having expected marketing name string and override not working.
- The fix should be on the omniperf repo side of things, so this pull request should be fine as is.
* External CI: create omniperf pipeline IDs, update nightly build (#3790)
* Fixed greater than to be less than in rocFFT changes
* fix footnote for 6.1.0 (#3791)
* fix footnote for 6.1.0
* fix empty columns in historical KFD title
* External CI: Publish wheel as artifact for rocPyDecode (#3796)
* External CI: fix hip-tests symlink creation (#3799)
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* External CI: fixes for rocMLIR and nightly build (#3800)
* External CI: fix symlinks for rocMLIR and nightly build
* add pipeline IDs for hip-tests
* fix hip-test ID typo
* remove llvm-alt license (#3727)
* remove llvm-alt license
* fix linting error
* External CI: enable ROCR-Runtime tests (#3809)
* External CI: default branches for hip-tests, omniperf (#3811)
* External CI: torch and torchvision smoke tests (#3810)
* External CI: torch and torchvision smoke tests
- Fixed issues with package name and version for the vision wheel that prevented it from installing. A patch is used until my pull request in vision repo is merged.
- Referred to rocAutomation scripts to pick which test scripts to run out of the many in the torch and vision repo, and iteratively tested suggested scripts to see which ones completed in a timely manner.
- Leveraging pytest-azurepipelines module to automatically parse and graph results from these tests.
* External CI: omnitrace build pipeline (#3812)
* External CI: omnitrace build pipeline starter
- Adding initial set of dependencies and build flags.
* External CI: omnitrace build pipeline
- Add bison, rccl, texinfo dependencies based on build failures.
- Add AMDGPU_TARGETS flag
- Add ROCm binaries to PATH for clang-format and other tools used.
* Fix indentation
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: AMDMIGraphX Build Fix (#3814)
- Swap to default gcc on OS to resolve build errors from recent commits.
- Added libdnnl-dev dependency from iterative attempts with compiler change.
- Referred to the passing GitHub checks to observe the compilers that was used.
- Build CK jit lib and include in AMDMIGraphX build.
* External CI: test fixes w/ roctracer, list omniperf as partially succeeding (#3815)
* External CI: rpp tests (#3816)
* External CI: Build pipeline for rocprofiler-sdk (#3819)
* External CI: Pipeline for rocprofiler-sdk
* Add rocprofiler dependency
* External CI: rocprofiler-sdk build pipeline
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: Fix/add missing pipeline IDs (#3818)
* External CI: omnitrace tests (#3822)
* Update tags to 6.2.2 (#3827)
* External CI: add roctracer to roc/hipSOLVER test deps (#3825)
* External CI: add rocprofiler-sdk pipeline IDs (#3824)
* External CI: AMDMIGraphX Smoke Tests (#3830)
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: MIOpen tests (#3837)
* Point to release history instead of deprecated changelog (#3836)
* External CI: filter out hipTensor extended tests (#3838)
* added revised note re. radeon gpus (#3839)
* Restructured the contributions section. (#3715)
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides (#3763)
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* updated the radeon note (#3850)
* External CI: Fix rocPyDecode wheel creation (#3852)
- Set values for expected environment variables.
- Accompanying changes required in rocPyDecode repo. Pull request will be made.
* External CI: pytorch vision patch removal (#3855)
My pull request applying this patch was merged upstream, so this is no longer needed and will break the pipeline since it can no longer be applied.
* Build(deps): Bump rocm-docs-core from 1.8.1 to 1.8.2 in /docs/sphinx (#3807)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.1...v1.8.2)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated the radeon note, as it were (#3857)
* updated the radeon note, as it were
* updated the note again
* Set devops team as codeowners for rocm-build (#3860)
* Set ext CI as codeowners for rocm-build
* Update CODEOWNERS to rocm-devops
* External CI: Add option to pull mainline branch for dependencies (#3689)
* External CI: Add option to pull mainline branch for dependencies
* Missing parameter for mainline branch dependencies.
* External CI: mainline branch definitions
* Removed MIGraphX optimization page (#3848)
* External CI: add a global variable to control gfx942 tests (#3864)
* External CI: update component default/mainline branches (#3871)
* External CI: Stop building gfx90a (#3872)
Save on VM resources until infrastructure has test targets.
* External CI: add libstdc++-12 to rocMLIR (#3874)
* Add building doc section (#3873)
* External CI: programmatically get latest aqlprofile (#3876)
* External CI: use ctest for rocm-examples (#3877)
* External CI: Tensile pipeline (#3884)
* add oversubscription conceptual doc (#3885)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
* add oversubscription conceptual doc (#3885)
(cherry picked from commit d0ecf51b0c)
* External CI: Add pipeline to build upstream boost (#3896)
* Update bitsandbytes branch in docs (#3898)
* Documentation: Add reference to precision-support floating-point types (#3899)
* External CI: use Boost template for MIOpen (#3903)
* External CI: create rocprofiler-systems pipeline (#3906)
* External CI: omnitrace/rocprof-sys pipeline IDs (#3908)
* External CI: MIOpen parse test results (#3913)
* External CI: Use pip to install latest cmake on test system (#3915)
* added a link to the compatibility matrix (#3904)
* added a link to the compatibility matrix
* removed quotes
* docs: Remove invalid amd_iommu=on parameter
Per kernel-parameters.txt, there is no "on" option for amd_iommu. While
intel_iommu has it, amd_iommu is automatically on unless specified
otherwise. For more info, see these 2 links:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt75aa74d52f/drivers/iommu/amd/init.c (L3481)
Signed-off-by: Kent Russell <kent.russell@amd.com>
* External CI: hipBLASLt build now requires python packaging module (#3926)
https://github.com/ROCm/hipBLASLt/pull/1250/files#diff-fee2e6f068b33fca3a1dc49392de8848dbf05c3f4632b680abb1052523e5a30fR35
* External CI: Moved location of upstream pytorch build scripts (#3930)
https://github.com/pytorch/pytorch/pull/138103
* External CI: disable rocMLIR tests (#3931)
* External CI: disable rocMLIR tests
* roctracer AMDGPU_TARGETS flag
* External CI: create a GPU diagnostics template (#3932)
* External CI: Add CK into pytorch build environment (#3934)
* External CI: add support to disable individual component tests (#3938)
* External CI: AMDMIGraphX greater-equal pip dependencies (#3939)
* Build(deps): Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#3933)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* External CI: rocDecode add libva-amdgpu-dev dependency (#3940)
* External CI: enumerate GPUs in gpu-diagnostics (#3942)
* External CI: move gpu-diag directly before tests (#3943)
* External CI: fix HIP_PIPELINE_ID (#3944)
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
* 6.2.4 release notes: add known/fixed issues (#193)
* add "for compute workloads" wording for clarity
* add AMDSMI resolved issue
* add dlm known issue
intro text
wording
* update wording
rm bullet point
update wording
* fix spellcheck due to spacing
* rm s
* rm gfx1151
* remove dlm known issue
* update list of updated docs; note for Radeon users
fmt
* update GA date for 6.2.4
* fix rdc version
* fix RDC version strings (#196)
* revert outdataed change for .azuredevops
* Fix 6.2.4 date in versions.md
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
* fix links in release notes 6.2.4 (#4008)
* Remove extra line
* Update xml files for 6.2.4 (#4012)
* Update xml files for 6.2.4
* Update README with 6.2.4
* Increase visibility of programming guide
* Docs: Update what is rocm description
* Apply suggestions from code review
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
* Update docs/how-to/hip_programming_guide.rst
Co-authored-by: MKKnorr <MKKnorr@web.de>
* WIP
* Update docs/index.md
* Update docs/how-to/hip_programming_guide.rst
Co-authored-by: MKKnorr <MKKnorr@web.de>
* Update docs/how-to/programming_guide.rst
* Update docs/what-is-rocm.rst
* Apply suggestions from code review
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* Update docs/how-to/programming_guide.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* Remove tip
* External CI: allow test failures to present as failures on Github (#3993)
* External CI: disable rdmatest and rocrtstFunc.Memory_Max_Mem (#4016)
* Added 6.2.4 manifest.xml
* External CI: fix comgr build (#4025)
* External CI: increase Tensile test timeout to 90 mins (#4027)
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Co-authored-by: Chris Kime <Christopher.Kime@amd.com>
Co-authored-by: ozziemoreno <109979778+ozziemoreno@users.noreply.github.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
Co-authored-by: Johannes Maria Frank <jmfrank63@gmail.com>
Co-authored-by: Brian Cornille <bcornill@amd.com>
Co-authored-by: Joseph Macaranas <Joseph.Macaranas@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Istvan Kiss <neon60@gmail.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Ameya Keshava Mallya <ameyakeshava.mallya@amd.com>
* add radeon pro v710 to gpu arch specs (#192)
* Add V710 specs
gpg: using RSA key
22223038B47B3ED4B3355AB11B54779B4780494E
gpg: Good signature from "Peter Park (MKMPETEPARK01)
<peter.park@amd.com>" [ultimate]
add some specs
add cols
clean up extra line
* fix graphics l1 cache description
* update SGPR for RDNA2 and RDNA3 archs
* update VGPR
* Apply suggestions from code review
* change l2 cache to 4
* Update docs/reference/gpu-arch-specs.rst
* ROCm 6.2.4 compatibility matrix (#186)
* prep compat column (historical) and mi300x column
* update historical compat matrix for 6.2.4
* update compat matrix for 6.2.4
* fix compat
* fix thunk version
* fix hipify ver
* ROCm 6.2.4 release notes (#184)
* prep 6.2.4 release notes
* add mathlibs
* add detail component changes
* rm non-updated linnks
* fix sentence
* fix rocthrust v
* rm offline installer
* condense
* add leo/ram fdback
words
* update documentation section
* add rocm on radeon note
* update os support note wording
* update release
* update version and GA date to 10-17
* update 6.2.4 rn
* update wording
* add link to v710
* update wording
* update templ
* simplify note
* words
os note
words
* change URLs to latest
* update link to supported GPUs
* Update versions.md 6.2.4 date to Oct 18
* Update conf.py release note date to Oct 18
---------
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Sync change from ROCm to ROCm-internal (#194)
* Fix Radeon link and point at R6.1.3 as absolute link (#3757)
* Update ROCm manifest to 6.2.1
* Update ROCm branch name
* Add 6.2.1 to version list (#3770)
* Add links to GH issues in 6.2.1 release notes (#3769)
* add MAD page
* link to GitHub issues in release notes known issues
* update templates for 6.2.1
* Revert "add MAD page"
This reverts commit 9cce72bba3.
* update wordlist for spellcheck linter
* add rccl note
* update rocal version change heading to be more obvious
* make rocal note more specific
* fix missing space
* fix capitalization
* Update RCCL known issue wording (#3775)
* add MAD page
* fix wording in RCCL known issue
* Revert "add MAD page"
This reverts commit c81d0f3b0a.
* update llvm version for 6.2.1 (#3779)
* Fix broken links in 6.2.1 release notes (#3782)
* External CI: Replace libomp dependencies with aomp (#3781)
Add roctracer dependency for hipBLAS and rocWMMA testing
* External CI: Add rocprofiler v1 and v2 smoke tests (#3784)
* External CI: ROCgdb smoke tests (#3785)
- Since this is an autotools project and not cmake, build and test on gfx942 system instead of separating into two jobs. Pipeline time is short anyway.
- Follow build instructions to update build flags and to incorporate the ROCdbgapi.
- Results are not parsed and graphed, but the log contents are printed at the end. This was helpful for debugging and will be kept in the pipeline, as the make check-gdb command's output was not helpful on its own.
* External CI: rocPyDecode Smoke Test (#3786)
* External CI: omniperf pipeline (#3788)
- Referred to public documentation, source, and iterative attempts to create and improve build and test pipeline.
- ctest failures are due to the test node not having expected marketing name string and override not working.
- The fix should be on the omniperf repo side of things, so this pull request should be fine as is.
* External CI: create omniperf pipeline IDs, update nightly build (#3790)
* Fixed greater than to be less than in rocFFT changes
* fix footnote for 6.1.0 (#3791)
* fix footnote for 6.1.0
* fix empty columns in historical KFD title
* External CI: Publish wheel as artifact for rocPyDecode (#3796)
* External CI: fix hip-tests symlink creation (#3799)
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* External CI: fixes for rocMLIR and nightly build (#3800)
* External CI: fix symlinks for rocMLIR and nightly build
* add pipeline IDs for hip-tests
* fix hip-test ID typo
* remove llvm-alt license (#3727)
* remove llvm-alt license
* fix linting error
* External CI: enable ROCR-Runtime tests (#3809)
* External CI: default branches for hip-tests, omniperf (#3811)
* External CI: torch and torchvision smoke tests (#3810)
* External CI: torch and torchvision smoke tests
- Fixed issues with package name and version for the vision wheel that prevented it from installing. A patch is used until my pull request in vision repo is merged.
- Referred to rocAutomation scripts to pick which test scripts to run out of the many in the torch and vision repo, and iteratively tested suggested scripts to see which ones completed in a timely manner.
- Leveraging pytest-azurepipelines module to automatically parse and graph results from these tests.
* External CI: omnitrace build pipeline (#3812)
* External CI: omnitrace build pipeline starter
- Adding initial set of dependencies and build flags.
* External CI: omnitrace build pipeline
- Add bison, rccl, texinfo dependencies based on build failures.
- Add AMDGPU_TARGETS flag
- Add ROCm binaries to PATH for clang-format and other tools used.
* Fix indentation
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: AMDMIGraphX Build Fix (#3814)
- Swap to default gcc on OS to resolve build errors from recent commits.
- Added libdnnl-dev dependency from iterative attempts with compiler change.
- Referred to the passing GitHub checks to observe the compilers that was used.
- Build CK jit lib and include in AMDMIGraphX build.
* External CI: test fixes w/ roctracer, list omniperf as partially succeeding (#3815)
* External CI: rpp tests (#3816)
* External CI: Build pipeline for rocprofiler-sdk (#3819)
* External CI: Pipeline for rocprofiler-sdk
* Add rocprofiler dependency
* External CI: rocprofiler-sdk build pipeline
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: Fix/add missing pipeline IDs (#3818)
* External CI: omnitrace tests (#3822)
* Update tags to 6.2.2 (#3827)
* External CI: add roctracer to roc/hipSOLVER test deps (#3825)
* External CI: add rocprofiler-sdk pipeline IDs (#3824)
* External CI: AMDMIGraphX Smoke Tests (#3830)
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: MIOpen tests (#3837)
* Point to release history instead of deprecated changelog (#3836)
* External CI: filter out hipTensor extended tests (#3838)
* added revised note re. radeon gpus (#3839)
* Restructured the contributions section. (#3715)
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides (#3763)
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* updated the radeon note (#3850)
* External CI: Fix rocPyDecode wheel creation (#3852)
- Set values for expected environment variables.
- Accompanying changes required in rocPyDecode repo. Pull request will be made.
* External CI: pytorch vision patch removal (#3855)
My pull request applying this patch was merged upstream, so this is no longer needed and will break the pipeline since it can no longer be applied.
* Build(deps): Bump rocm-docs-core from 1.8.1 to 1.8.2 in /docs/sphinx (#3807)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.1...v1.8.2)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated the radeon note, as it were (#3857)
* updated the radeon note, as it were
* updated the note again
* Set devops team as codeowners for rocm-build (#3860)
* Set ext CI as codeowners for rocm-build
* Update CODEOWNERS to rocm-devops
* External CI: Add option to pull mainline branch for dependencies (#3689)
* External CI: Add option to pull mainline branch for dependencies
* Missing parameter for mainline branch dependencies.
* External CI: mainline branch definitions
* Removed MIGraphX optimization page (#3848)
* External CI: add a global variable to control gfx942 tests (#3864)
* External CI: update component default/mainline branches (#3871)
* External CI: Stop building gfx90a (#3872)
Save on VM resources until infrastructure has test targets.
* External CI: add libstdc++-12 to rocMLIR (#3874)
* Add building doc section (#3873)
* External CI: programmatically get latest aqlprofile (#3876)
* External CI: use ctest for rocm-examples (#3877)
* External CI: Tensile pipeline (#3884)
* add oversubscription conceptual doc (#3885)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
* add oversubscription conceptual doc (#3885)
(cherry picked from commit d0ecf51b0c)
* External CI: Add pipeline to build upstream boost (#3896)
* Update bitsandbytes branch in docs (#3898)
* Documentation: Add reference to precision-support floating-point types (#3899)
* External CI: use Boost template for MIOpen (#3903)
* External CI: create rocprofiler-systems pipeline (#3906)
* External CI: omnitrace/rocprof-sys pipeline IDs (#3908)
* External CI: MIOpen parse test results (#3913)
* External CI: Use pip to install latest cmake on test system (#3915)
* added a link to the compatibility matrix (#3904)
* added a link to the compatibility matrix
* removed quotes
* docs: Remove invalid amd_iommu=on parameter
Per kernel-parameters.txt, there is no "on" option for amd_iommu. While
intel_iommu has it, amd_iommu is automatically on unless specified
otherwise. For more info, see these 2 links:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt75aa74d52f/drivers/iommu/amd/init.c (L3481)
Signed-off-by: Kent Russell <kent.russell@amd.com>
* External CI: hipBLASLt build now requires python packaging module (#3926)
https://github.com/ROCm/hipBLASLt/pull/1250/files#diff-fee2e6f068b33fca3a1dc49392de8848dbf05c3f4632b680abb1052523e5a30fR35
* External CI: Moved location of upstream pytorch build scripts (#3930)
https://github.com/pytorch/pytorch/pull/138103
* External CI: disable rocMLIR tests (#3931)
* External CI: disable rocMLIR tests
* roctracer AMDGPU_TARGETS flag
* External CI: create a GPU diagnostics template (#3932)
* External CI: Add CK into pytorch build environment (#3934)
* External CI: add support to disable individual component tests (#3938)
* External CI: AMDMIGraphX greater-equal pip dependencies (#3939)
* Build(deps): Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#3933)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* External CI: rocDecode add libva-amdgpu-dev dependency (#3940)
* External CI: enumerate GPUs in gpu-diagnostics (#3942)
* External CI: move gpu-diag directly before tests (#3943)
* External CI: fix HIP_PIPELINE_ID (#3944)
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
* 6.2.4 release notes: add known/fixed issues (#193)
* add "for compute workloads" wording for clarity
* add AMDSMI resolved issue
* add dlm known issue
intro text
wording
* update wording
rm bullet point
update wording
* fix spellcheck due to spacing
* rm s
* rm gfx1151
* remove dlm known issue
* update list of updated docs; note for Radeon users
fmt
* update GA date for 6.2.4
* fix rdc version
* fix RDC version strings (#196)
* revert outdataed change for .azuredevops
* Fix 6.2.4 date in versions.md
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
* add "for compute workloads" wording for clarity
* add AMDSMI resolved issue
* add dlm known issue
intro text
wording
* update wording
rm bullet point
update wording
* fix spellcheck due to spacing
* rm s
* rm gfx1151
* remove dlm known issue
* update list of updated docs; note for Radeon users
fmt
* update GA date for 6.2.4
* fix rdc version
* Fix Radeon link and point at R6.1.3 as absolute link (#3757)
* Update ROCm manifest to 6.2.1
* Update ROCm branch name
* Add 6.2.1 to version list (#3770)
* Add links to GH issues in 6.2.1 release notes (#3769)
* add MAD page
* link to GitHub issues in release notes known issues
* update templates for 6.2.1
* Revert "add MAD page"
This reverts commit 9cce72bba3.
* update wordlist for spellcheck linter
* add rccl note
* update rocal version change heading to be more obvious
* make rocal note more specific
* fix missing space
* fix capitalization
* Update RCCL known issue wording (#3775)
* add MAD page
* fix wording in RCCL known issue
* Revert "add MAD page"
This reverts commit c81d0f3b0a.
* update llvm version for 6.2.1 (#3779)
* Fix broken links in 6.2.1 release notes (#3782)
* External CI: Replace libomp dependencies with aomp (#3781)
Add roctracer dependency for hipBLAS and rocWMMA testing
* External CI: Add rocprofiler v1 and v2 smoke tests (#3784)
* External CI: ROCgdb smoke tests (#3785)
- Since this is an autotools project and not cmake, build and test on gfx942 system instead of separating into two jobs. Pipeline time is short anyway.
- Follow build instructions to update build flags and to incorporate the ROCdbgapi.
- Results are not parsed and graphed, but the log contents are printed at the end. This was helpful for debugging and will be kept in the pipeline, as the make check-gdb command's output was not helpful on its own.
* External CI: rocPyDecode Smoke Test (#3786)
* External CI: omniperf pipeline (#3788)
- Referred to public documentation, source, and iterative attempts to create and improve build and test pipeline.
- ctest failures are due to the test node not having expected marketing name string and override not working.
- The fix should be on the omniperf repo side of things, so this pull request should be fine as is.
* External CI: create omniperf pipeline IDs, update nightly build (#3790)
* Fixed greater than to be less than in rocFFT changes
* fix footnote for 6.1.0 (#3791)
* fix footnote for 6.1.0
* fix empty columns in historical KFD title
* External CI: Publish wheel as artifact for rocPyDecode (#3796)
* External CI: fix hip-tests symlink creation (#3799)
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* External CI: fixes for rocMLIR and nightly build (#3800)
* External CI: fix symlinks for rocMLIR and nightly build
* add pipeline IDs for hip-tests
* fix hip-test ID typo
* remove llvm-alt license (#3727)
* remove llvm-alt license
* fix linting error
* External CI: enable ROCR-Runtime tests (#3809)
* External CI: default branches for hip-tests, omniperf (#3811)
* External CI: torch and torchvision smoke tests (#3810)
* External CI: torch and torchvision smoke tests
- Fixed issues with package name and version for the vision wheel that prevented it from installing. A patch is used until my pull request in vision repo is merged.
- Referred to rocAutomation scripts to pick which test scripts to run out of the many in the torch and vision repo, and iteratively tested suggested scripts to see which ones completed in a timely manner.
- Leveraging pytest-azurepipelines module to automatically parse and graph results from these tests.
* External CI: omnitrace build pipeline (#3812)
* External CI: omnitrace build pipeline starter
- Adding initial set of dependencies and build flags.
* External CI: omnitrace build pipeline
- Add bison, rccl, texinfo dependencies based on build failures.
- Add AMDGPU_TARGETS flag
- Add ROCm binaries to PATH for clang-format and other tools used.
* Fix indentation
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: AMDMIGraphX Build Fix (#3814)
- Swap to default gcc on OS to resolve build errors from recent commits.
- Added libdnnl-dev dependency from iterative attempts with compiler change.
- Referred to the passing GitHub checks to observe the compilers that was used.
- Build CK jit lib and include in AMDMIGraphX build.
* External CI: test fixes w/ roctracer, list omniperf as partially succeeding (#3815)
* External CI: rpp tests (#3816)
* External CI: Build pipeline for rocprofiler-sdk (#3819)
* External CI: Pipeline for rocprofiler-sdk
* Add rocprofiler dependency
* External CI: rocprofiler-sdk build pipeline
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: Fix/add missing pipeline IDs (#3818)
* External CI: omnitrace tests (#3822)
* Update tags to 6.2.2 (#3827)
* External CI: add roctracer to roc/hipSOLVER test deps (#3825)
* External CI: add rocprofiler-sdk pipeline IDs (#3824)
* External CI: AMDMIGraphX Smoke Tests (#3830)
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: MIOpen tests (#3837)
* Point to release history instead of deprecated changelog (#3836)
* External CI: filter out hipTensor extended tests (#3838)
* added revised note re. radeon gpus (#3839)
* Restructured the contributions section. (#3715)
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides (#3763)
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* updated the radeon note (#3850)
* External CI: Fix rocPyDecode wheel creation (#3852)
- Set values for expected environment variables.
- Accompanying changes required in rocPyDecode repo. Pull request will be made.
* External CI: pytorch vision patch removal (#3855)
My pull request applying this patch was merged upstream, so this is no longer needed and will break the pipeline since it can no longer be applied.
* Build(deps): Bump rocm-docs-core from 1.8.1 to 1.8.2 in /docs/sphinx (#3807)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.1...v1.8.2)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* updated the radeon note, as it were (#3857)
* updated the radeon note, as it were
* updated the note again
* Set devops team as codeowners for rocm-build (#3860)
* Set ext CI as codeowners for rocm-build
* Update CODEOWNERS to rocm-devops
* External CI: Add option to pull mainline branch for dependencies (#3689)
* External CI: Add option to pull mainline branch for dependencies
* Missing parameter for mainline branch dependencies.
* External CI: mainline branch definitions
* Removed MIGraphX optimization page (#3848)
* External CI: add a global variable to control gfx942 tests (#3864)
* External CI: update component default/mainline branches (#3871)
* External CI: Stop building gfx90a (#3872)
Save on VM resources until infrastructure has test targets.
* External CI: add libstdc++-12 to rocMLIR (#3874)
* Add building doc section (#3873)
* External CI: programmatically get latest aqlprofile (#3876)
* External CI: use ctest for rocm-examples (#3877)
* External CI: Tensile pipeline (#3884)
* add oversubscription conceptual doc (#3885)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
* add oversubscription conceptual doc (#3885)
(cherry picked from commit d0ecf51b0c)
* External CI: Add pipeline to build upstream boost (#3896)
* Update bitsandbytes branch in docs (#3898)
* Documentation: Add reference to precision-support floating-point types (#3899)
* External CI: use Boost template for MIOpen (#3903)
* External CI: create rocprofiler-systems pipeline (#3906)
* External CI: omnitrace/rocprof-sys pipeline IDs (#3908)
* External CI: MIOpen parse test results (#3913)
* External CI: Use pip to install latest cmake on test system (#3915)
* added a link to the compatibility matrix (#3904)
* added a link to the compatibility matrix
* removed quotes
* docs: Remove invalid amd_iommu=on parameter
Per kernel-parameters.txt, there is no "on" option for amd_iommu. While
intel_iommu has it, amd_iommu is automatically on unless specified
otherwise. For more info, see these 2 links:
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt75aa74d52f/drivers/iommu/amd/init.c (L3481)
Signed-off-by: Kent Russell <kent.russell@amd.com>
* External CI: hipBLASLt build now requires python packaging module (#3926)
https://github.com/ROCm/hipBLASLt/pull/1250/files#diff-fee2e6f068b33fca3a1dc49392de8848dbf05c3f4632b680abb1052523e5a30fR35
* External CI: Moved location of upstream pytorch build scripts (#3930)
https://github.com/pytorch/pytorch/pull/138103
* External CI: disable rocMLIR tests (#3931)
* External CI: disable rocMLIR tests
* roctracer AMDGPU_TARGETS flag
* External CI: create a GPU diagnostics template (#3932)
* External CI: Add CK into pytorch build environment (#3934)
* External CI: add support to disable individual component tests (#3938)
* External CI: AMDMIGraphX greater-equal pip dependencies (#3939)
* Build(deps): Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#3933)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* External CI: rocDecode add libva-amdgpu-dev dependency (#3940)
* External CI: enumerate GPUs in gpu-diagnostics (#3942)
* External CI: move gpu-diag directly before tests (#3943)
* External CI: fix HIP_PIPELINE_ID (#3944)
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Sandra Polifroni <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Michael Benavidez <michael.benavidez@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: MKKnorr <MKKnorr@web.de>
Co-authored-by: Kent Russell <kent.russell@amd.com>
Co-authored-by: Joseph Greathouse <jlgreathouse@users.noreply.github.com>
* prep 6.2.4 release notes
* add mathlibs
* add detail component changes
* rm non-updated linnks
* fix sentence
* fix rocthrust v
* rm offline installer
* condense
* add leo/ram fdback
words
* update documentation section
* add rocm on radeon note
* update os support note wording
* update release
* update version and GA date to 10-17
* update 6.2.4 rn
* update wording
* add link to v710
* update wording
* update templ
* simplify note
* words
os note
words
* change URLs to latest
* update link to supported GPUs
* Update versions.md 6.2.4 date to Oct 18
* Update conf.py release note date to Oct 18
---------
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Add V710 specs
gpg: using RSA key
22223038B47B3ED4B3355AB11B54779B4780494E
gpg: Good signature from "Peter Park (MKMPETEPARK01)
<peter.park@amd.com>" [ultimate]
add some specs
add cols
clean up extra line
* fix graphics l1 cache description
* update SGPR for RDNA2 and RDNA3 archs
* update VGPR
* Apply suggestions from code review
* change l2 cache to 4
* Update docs/reference/gpu-arch-specs.rst
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
(cherry picked from commit d0ecf51b0c)
add mitigiation steps
add to toc
move page for build
move doc
fix spelling
update doc
update oversubscription
update order
fix spelling
add oversubscription to wordlist
move oversubscription topic to bottom of toc and index
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
* Add links to gpu cluster network guides
* Add newline character to eof
* Make link absolute
* add dynamic branch in toc
* remove unnecessary page
clean up
* clean up index/toc
* make multi-node topics adjacent
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* testing if this file is editable
* changed 'kebob-case' to 'dash-case'
* Restructured the page to be more straightforward and provide additional repo information
* forgot to save
* Moved the topic sentence
* Wrong accent on the a in diataxis
* Removed the feedback info from contributing and moved it to Feedback
* fixed spelling errors
* fixed some wording and removed second person text
* consolidated Build and Structure into Contribute; edited toolchai to (hopefully) conform to style guide; updated toc
* updated the titles in the toc
* made changes based on feedback
* it's better when you save
* removed structure and build; fixed something for the linter
* added rst to wordlist
* added customizations to wordlist
- Swap to default gcc on OS to resolve build errors from recent commits.
- Added libdnnl-dev dependency from iterative attempts with compiler change.
- Referred to the passing GitHub checks to observe the compilers that was used.
- Build CK jit lib and include in AMDMIGraphX build.
* External CI: omnitrace build pipeline starter
- Adding initial set of dependencies and build flags.
* External CI: omnitrace build pipeline
- Add bison, rccl, texinfo dependencies based on build failures.
- Add AMDGPU_TARGETS flag
- Add ROCm binaries to PATH for clang-format and other tools used.
* Fix indentation
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* External CI: torch and torchvision smoke tests
- Fixed issues with package name and version for the vision wheel that prevented it from installing. A patch is used until my pull request in vision repo is merged.
- Referred to rocAutomation scripts to pick which test scripts to run out of the many in the torch and vision repo, and iteratively tested suggested scripts to see which ones completed in a timely manner.
- Leveraging pytest-azurepipelines module to automatically parse and graph results from these tests.
* update current matrix for 6.2.2
* update history compat
* fix typo
* fixed missed 60201s
* fix missed rocm-6.2.1
* Add additional column to compatibility-matrix-historical-6.0, so it includes it correctly
Also, fixing a few 6.2.2 footnote references
* add oracle linux 8.9 under 6.2.2 in historical
* rm widths in historical table
* lowercase a letter
* Fix version numbers for 6.2.2
* Minor updates to historical matrix
* add ubuntu 24.04.1
* Docs: Add Ubuntu 24.04.1 (#3801)
* add ubuntu 24.04.1
* add 24.04.1 to bottom os section
* fix heading and template
* Update compatibility-matrix.rst for OpenMP version
* Update compatibility-matrix-historical-6.0.csv for OpenMP version
* rm ubuntu 24.04.1 from 6.2.0
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* rm duplicate ubuntu in historical
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* add overwritten ubuntu 24.04.1
* fix wrong versions and extra comma
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
- Referred to public documentation, source, and iterative attempts to create and improve build and test pipeline.
- ctest failures are due to the test node not having expected marketing name string and override not working.
- The fix should be on the omniperf repo side of things, so this pull request should be fine as is.
- Since this is an autotools project and not cmake, build and test on gfx942 system instead of separating into two jobs. Pipeline time is short anyway.
- Follow build instructions to update build flags and to incorporate the ROCdbgapi.
- Results are not parsed and graphed, but the log contents are printed at the end. This was helpful for debugging and will be kept in the pipeline, as the make check-gdb command's output was not helpful on its own.
* add MAD page
* link to GitHub issues in release notes known issues
* update templates for 6.2.1
* Revert "add MAD page"
This reverts commit 9cce72bba3.
* update wordlist for spellcheck linter
* add rccl note
* update rocal version change heading to be more obvious
* make rocal note more specific
* fix missing space
* fix capitalization
* first pass of the release notes for 6.2.1 (#131)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* Spolifroni amd/release notes 621 (#135)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* added documentation highlights (#136)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* Added information for rocdbgapi (#138)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Updates to documentation section; changed "key" to "notable" (#139)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* Updated the release date and made changes to component details (#140)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* Updated the known issues intro (#141)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* test (#142)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* Spolifroni amd/release notes 621 (#143)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* Reworded some things (#146)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* Added info for rocal 2.0.0 (#147)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Some small changes to the release notes (#148)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* Updated with more components for RC3 (#149)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* Small changes to wording, punctuation; fixed a list (#150)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* Updated versions and removed previous release notes. (#151)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* Update to hightlights, SMI, small fixes (#152)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* Updated the known issues wording for rocAL (#153)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* small fixes (#155)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray "notable" (#156)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* Added offline installer highlight (#157)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added link to offline installer; aligned rn with other FBGEEM doc (#158)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component (#159)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed broken links (#160)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* updated the links again and removed rocAL optimization and known issues (#161)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date (#163)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes (#165)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* Moved known issue to omnitrace (#166)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked omnitrace wording (#167)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed rocdbgapi (#168)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Changed wording in offline installer changes (#169)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* Updated to show no new Known Issues. (#170)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* changed Known Issues to say that there are no known issues
* updated the upcoming changes (#171)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* changed Known Issues to say that there are no known issues
* added rccl plugin removal
* added lack of mi300x support to hardware (#172)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* changed Known Issues to say that there are no known issues
* added rccl plugin removal
* added lack of MI300X supporort
* removed a contraction (#173)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* changed Known Issues to say that there are no known issues
* added rccl plugin removal
* added lack of MI300X supporort
* i don't like contractions. te irony
* Changed the link in known issues (#174)
* first pass of the release otnotes for 6.2.1
* something went wrong building the relnotes the first time; this should be OKer
* Partially complete release notees for 6.2.1
* added a line about there being no OS changes in 6.2.1 relative 6.2.0
* Updated version and date
* made wording changes and added documentation highlights
* added information about rocdbgapi
* Changed 'key' to 'notable'; clarified that changes are from 6.2.0 to 6.2.1; clarified the open-source nature of the documentation; brought a note back.
* updated the release date in conf.py; removed added api calls for HIP; added fixed issues to rcodbgapi
* changed the opening intro to Known Issues
* fixed the major copy-pasta error with upcoming changes
* removed a word just to see what happens
* putting the "are" back
* removed the HIP changes; they were in 6.2.0
* corrected some formatting errors
* changed some wording
* changed a word
* reworded the known issues
* added info for rocAL 2.0.0
* Updated the wording on the rocAL changes
* made some small changes.
* minor wording change
* added more component changes
* fixed a bad table; made some minor changes to punctuation and spelling.
* The hipify version needs to be updated to reflect that its version reflects the rocm version. So it went from 6.2.0 to 6.2.1
* undid the hipify version change, but updated the version of amd smi
* removed the previous release notes.
* updated release date to Sept 12
* modified the ROCm SMI entry; workaround reworded and put into known issues; one line added to resolved issues
* Added the FBGEEM support highlight
* updated wording on rocAL known issues
* made some small edits
* removed a stray 'notable'
* added offline installer highlight
* added a link to the offline installer doc; removed the second uppercase E in FBGEEM long-form to align with the other documentation
* fixed a link that had to go to latest rather than to 6.2.1
* trying to trigger a pr
* undoing the last change
* changed a link; fixed wording; added a 'removals' section for one component
* fixed up the list for rocAL to make it more compact
* fixed broken links to component documentation
* Removed optimizations and known issues from rocal
* updated doc links of 404ing components to their readthedocs documentation. Tensile won't be released until later so the link goes to github. Will need to double-check links after release to make sure they still work.
* updated release date
* small changes
* moved known issue to omnitrace
* tweeked the omnitrace workaround language to be more precise
* fixed ROCdbgapi
* Updated wording for Offline Installer changes
* changed Known Issues to say that there are no known issues
* added rccl plugin removal
* added lack of MI300X supporort
* i don't like contractions. te irony
* fixed the label in known issues github link and also changed it from being a link to known issues to issues, since there are no verified known issues at this point
* removed link to github and reference to the list of known issues
* remove "6.2.1 does not support MI300X" and add MI300X GPU recovery failure KI
* update words
* removed info re. rocdbgapi known issues (#176)
* Added point about version change to rocal
* Put link to prerequisites in rocal
---------
Co-authored-by: Peter Park <peter.park@amd.com>
* adding preliminary compatibility matrix data for 6.2.1
* bump up some version numbers from 6.2.0 to 6.2.1
* adding kernel versions to compatibility matrix. I hate it
* add kernel version lookup table, in dropdown list
* add KFD and User space support. Also adjust some meta data keywords
* update 6.2.1 RC2 versions
* make spelling linter happy
* remove kernel versions from table, just reference LUT below
* Leave kenerel Lookup table expanded
* update kernel version table
* remove kernels from historical matrix, update footnotes
* move historical matrix into compatibility folder
* update historical matrix paths
* version bumps for RC3
* RC4 has no other version bumps. Reorder RPP alphabetically
* change How-To card hue to purple
- Add roctracer dependency to hipBLASLt build to address recent failures.
- Change build pool to ultra due to increased build times.
- Enable ccache to help with build times.
- Referred to public documentation, build instructions, source code in tests directory, and iterative runs to modify build flags.
- rdci test failures are known due to singleton nature of rocprofiler, but gtest attempting to spawn multiple instances. There is an internal ticket to track the issue.
Referred to public documentation, build instructions, and iterative debug runs to update build flags, publish new artifacts, and run tests. Test results are not parsed and graphed in Azure.
40% pass rate for this initial pass. Would like to push this through to at least change the build process and then defer fixing the remaining test failures.
- Test results are not parsed to be graphed in Azure reports.
- Added ccache to potentially improve build times, keyed against the date and hash based on amdclang++ binary.
* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page
* Add words to wordlist and fix a typo
* Add new sections for Docker and testing
* Incorporate comments from the external review
* Some minor edits and clarifications
* Incorporate further review coments and fix test section
* Add comment to test section
* Change git clone command for FBGEMM repo
* Change Docker command
* Changes from internal review
* Fix linting issue
* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page
* Add words to wordlist and fix a typo
* Add new sections for Docker and testing
* Incorporate comments from the external review
* Some minor edits and clarifications
* Incorporate further review coments and fix test section
* Add comment to test section
* Change git clone command for FBGEMM repo
* Change Docker command
* Changes from internal review
* Fix linting issue
Replace cmake calls with bash script calls to compile the components comprising openmp-extras.
Added inline comments to describe the bash scripts from aomp repo being executed.
- Added steps for creating wheel file for torchvision.
- Tried to add torchaudio as well, but it was not reading in AMDGPU_TARGETS value in the nested cmake calls from the python setup.py execution.
- Upstream pytorch builder scripts were updated, so it broke the patching step in the job. Removed the need to patch by using a flag to skip the tests.
- Will work on adding smoke tests of pytorch and torchvision later, just getting this out to fix the nightly build.
* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub
* Update target link for the Docker vLLM guide
* Change target URL
* Change link target URL again
* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub
* Update target link for the Docker vLLM guide
* Change target URL
* Change link target URL again
* Added all variables found in the library page on Azure
* removed extra space
* copied the example of referencing variables from variables-global.yml and add HALF560_PIPELINE_ID to the file
* introduced variables-global.yml to this file and pointed the path to variables.CCACHE_DIR
* introduced variables-global.yml and changed all variables in stagingPipelineIdentifiers and taggedPipelineIdentifiers to match the identifier names in variables-global.yml
* adjusted how the variables are introduced into the file
* tried adding ./ to variables-global.yml path
* copied the formatting from develop branch but changed identifiers to match them in variables-global.yml
* changed build pool to high to test if variable works
* recopied variables from library page to account for any changes
* changed build pool back to medium
* removed extra whitespace
* remove whitespace
* added all the variables from the page on azure
* fix merge
fix merge
---------
Co-authored-by: Daniel Su <danielsu@amd.com>
* move precision_support to reference
* add rocPyDecode to AI
* Use CSS style sheets for Card titles
* remove temp folder and files
* add card hues
* shuffle hues
* update requirements
* add hues test
* add hues test2
* select hues
* remove hues test
* use hues and add gutters
* sync TOC and index titles
* once more through the TOC
- Updating pipelines to account for combined repo changes of ROCR-Runtime and ROCT-Thunk-Interface.
- Removed dependencies referring ROCT-Thunk-Interface since it is now in the ROCR-Runtime repo.
- Changed ROCR-Runtime build command to account for directory changes.
* add rocAL, hipCC, CLR. Rearrange order of some items to align with stack diagram. Update UCC versions
* update llvm-project to point to docs page instead of GitHub
* Add a section on increasing memory allocation to the MI300A system optimization guide
* Addition to wordlist
* Change GB to GiB for consistency
* Standardize GiB/KiB spacing
* Minor wording changes
* Add a section on increasing memory allocation to the MI300A system optimization guide
* Addition to wordlist
* Change GB to GiB for consistency
* Standardize GiB/KiB spacing
* Minor wording changes
* Rewrote the section to be minimalist and not specify the number of ways to provide feedback. Also removed the PR info since that's covered in Contributing.
* Update feedback.md
Got feedback from Leo about how to improve on this and make it conform to the style guide. Updated with changes based on that feedback.
Extension of PR #3544 and additional logic for ROCm dependency downloads to account for gpu target for components that can specify GPU target when building or have direct dependencies of these components. Plus, refactoring if statements to reduce lines of code.
Adding support for parallel build jobs where the only difference is the singular GPU target. This allows nightly packaging jobs to pick and choose based on GPU target to reduce download size.
To accommodate this new feature producing multiple artifacts for a component, added support for a file filter when downloading a ROCm component using the format "componentName:fileFilter".
* initial commit for placeholder 6.2 data
* fix TensorFlow versions, and LLVM/OpenMP version strings
* add third column with 6.1.0 as last column. Update some versions from Peter's review comments
* reduce RPP name
* remove trailing comma
* reduce length of 3rd party communications libs title
* change footnote for 6.2 to remove mention of MI300A
* remove TransferBench
* change from 6.1.0 to 6.0.0 data in last column
* fixing a few version numbers
* add rocprofiler-sdk version
* fix omnitrace version
* adding full matrix, 2 different views
* add copying csv in conf.py
* 6.2 content edits, and change subheadings to remove :, renamed a few as Leo suggested
* add Framework anchor within compat matrix, and fix linting error
* categorized tools
* update Cub/Thrust versions, abbreviate Management
* remove the dedicated histtorical page
* WIP commit, added anchors and in compat matrix, along with anchor test code
* check 6.1.1 and 6.0.2 versions, add anchors thru table
* audit 6.2 RC4 versions against table, remove clang-ocl, and update hip-other version
* avoid linting
* MI300A system optimization guide internal draft
* Small changes to System BIOS paragraph
* Some minor edits
* Changes after external review feedback
* Add CPU Affinity debug setting
* Edit CPU Affinity debug setting
* Changes from external discussion
* Add glossary and other small fixes
* Additional changes from the review
* Update the IOMMU guidance
* Change description of CPU affinity setting
* Slight rewording
* Change Debian to Red Hat-based
* A few changes from the second internal review
* Add MI300X tuning guides
Add mi300x doc (pandoc conversion)
fix headings
add metadata
move images to shared/
move images to shared/
convert tuning-guides.md to rst using pandoc
add mi300x to tuning-guides.rst landing page
update h1s, toc, and landing page
fix spelling
fix fmt
format code blocks
add tensilelite imgs
fix formatting
fix formatting some more
fix formatting
more formatting
spelling
remove --enforce-eager note
satisfy spellcheck linter
more spelling
add fixes from hongxia
fix env var in D5
add fixes to PyTorch inductor section
fix
fix
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update docs/how-to/tuning-guides/mi300x.rst
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update 'torch_compile_debug' suggestion based on Hongxia's feedback
fix PyTorch inductor env vars
minor formatting fixes
Apply suggestions from code review
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
Update vllm path
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
disable numfig in Sphinx configuration
fix formatting and capitalization
add words to wordlist
update index
update wordlist
update optimizing-triton-kernel
convert cards to table
fix link in index.md
add @lpaoletti's feedback
Add system tuning guide
add images
add system section
add os settings and sys management
remove pcie=noats recommendation
reorg
add blurb to developer section
impr formatting
remove windows os from tuning guides pages in conf.py
add suggestions from review
fix typo and link
remove os windows from relevant pages in conf
mi300x
add suggestions from review
fix toc
fix index links
reorg
update vLLM vars
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
update vLLM vars
Co-authored-by: Hongxia Yang
<62075498+hongxiayang@users.noreply.github.com>
reorganize
add warnings
add text to system tuning
add filler text on index pages
reorg tuning pages
fix links
fix vars
* rm old pages
fix toc
* add suggestions from review
small change
add more suggestions
rewrite intro
* add 'workload tuning philosophy'
* refactor
* fix broken links
* black format conf.py
* simplify cmd and update doc structure
* add higher-level heading for consistency (mi300x.rst)
* add fixes from review
fix url
add fixes
fix formatting
fix fmt
fix hipBLASLt section
change words
fix tensilelite section
fix
fix
fix fmt
* style guide
* fix some formatting
* satisfy spellcheck linter
* update wordlist
* fix bad conflict resolution
* Switch all pipeline gpu targets to gfx942
* Change more pipelines target to gfx942
* set variables for manual testing
* Switch all pipeline gpu targets to gfx942
* Change more pipelines target to gfx942
* set variables for manual testing
* add test pipeline id
* revert test changes
* correct gpu target name
* remove unused flags; change hipSPARSELt target to be gfx942
* added professional graphic
to replace hand modified
* Update deep-learning-rocm.rst
update image reference
* Delete docs/data/how-to/framework_install_2024_05_23-update.png
replace with renamed file with correct date
* Add files via upload
updated dat in file name
* Update deep-learning-rocm.rst
corrected image name to reflect new date
* Update deep-learning-rocm.rst
corrected file name
* Add files via upload
correct name
* Delete docs/data/how-to/framework_install_2024_07-04.png
name format incorrect
* Update deep-learning-rocm.rst
correct image name
* add CXX flag
* add CXX flag
* Update ROCmValidationSuite.yml
* Change googletest to libgtest-dev
* Update ROCmValidationSuite.yml
* Update ROCmValidationSuite.yml
* add ROCM_PATH as env var
* add HIP_INC_DIR
* remove manual test variables
* set variables for manual test
* remove CMAKE_CXX_COMPILER flag
* Set link to redirect llvm folder
* correct indentation
* remove manual test variables
* rename task
* update CLR docs reference
* Apply suggestions from code review
Co-authored-by: Peter Park <peter.park@amd.com>
---------
Co-authored-by: amitkumar-amd <Amit.Kumar6@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
* Use components.xml instead of default.xml
* Rm unused var
* Use category instead of group
* Add group and category
* Change changelog template
* Conditional display
* Remove sort
* Add mappings
* Jinja does not track state
* Handle dupe logic in python
* Construct doc page and repo url
* Add repo url
* Add doc page
* Avoid using bare URL
* Add None key
* Test release notes
[Why]
To maintain the "pitchfork layout" convention used by the repository.
[How]
- Update README.md
- Update INFRA_REPO in ROCm.mk
- Updated to new path: ROCm/tools/rocm-build
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
With MIOpen now building with latest source on External CI, this unblocked AMDMIGraphX from building with latest source.
Determined rocMLIR also needed to be built with latest source as a dependency.
[Why]
To maintain the "pitchfork layout" convention used by the repository.
[How]
- Update README.md
- Update INFRA_REPO in ROCm.mk
- Updated to new path: ROCm/tools/rocm-build
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Fix first link in compatibility matrix table
* Revert "Fix first link in compatibility matrix table"
This reverts commit 069c5c116a.
* Remove sticky header and unused css
* Remove container from hardware specs matrix
---------
Co-authored-by: Peter Jun Park <peter.park@amd.com>
* Fix first link in compatibility matrix table
* Revert "Fix first link in compatibility matrix table"
This reverts commit 069c5c116a.
* Remove sticky header and unused css
* Remove container from hardware specs matrix
---------
Co-authored-by: Peter Jun Park <peter.park@amd.com>
* Regenerate changelog
* Add component changelogs and known issue
Fix RELEASE.md headings
Update pub datestamp for 6.1.2
Add AMDSMI and ROCm SMI to 6.1.2 template
Add rccl and rocBLAS
Update intro blurb and headings
Add ROCm SMI fix
Add missed heading to AMDSMI
Update datestamp and release version number
Update version and release number
Add known issue re: MI300X error detection
Words
Add issue link
Rm GitHub issue link
Move known issue down
Update ki wording
Remove "this issue has been investigated ... " from known issue
Fix changelog h1
* Reorg known issue, upcoming changes, remove rocDecode tested configurations
* Add fixes from review
* Add fixed issue link
* Fix heading
* Remove known issue
* Update the links for rocminfo and rocm-bandwidth-test
* Update the links for rocminfo and rocm-bandwidth-test
* Update the links for rocminfo and rocm-bandwidth-test
* Update links to intersphinx links
---------
Co-authored-by: Peter Jun Park <peter.park@amd.com>
* Update the links for rocminfo and rocm-bandwidth-test
* Update the links for rocminfo and rocm-bandwidth-test
* Update the links for rocminfo and rocm-bandwidth-test
* Update links to intersphinx links
---------
Co-authored-by: Peter Jun Park <peter.park@amd.com>
* Add Fine Tuning LLMs how to guide
* Reorg and refactor Fine-tuning LLMs with ROCm
Update index and headings
Fix formatting and update toc
Split out content from index to overview.rst
Add metadata
Clean up overview
Add inference sections, fix rst errors, clean up single-gpu-fine-tuning
Combine fine-tuning and inference guides
Fix some links and formatting
Update toc and add formatting fixes
Add ck kernel fusion content
Update toc
Clean up model quantization and acceleration
Add CK images
Clean up profiling
Update triton kernel performance optimization
Update llm inference frameworks guide
Disable automatic number of figures and tables in Sphinx conf
Change tabs to spaces
Change heading to end with -ing
Add link fixes and heading updates
Add rocprof/Omniperf/Omnitrace section
Update profiling and debugging guide
Add formatting fixes
Satisfy spellcheck
Fix words
Delete unused file
Finish overview
Clean up first 4 sections
Multi-gpu fine-tuning guide: slight fixes
Update toc
Remove tabs
Formatting fixes
* Minor wording updates
* Add some clean-up
* Update profiling and debugging gudie
* Fix Omnitrace link
* Update ck kernel fusion with latest
* Update CK formatting
* Fix perfetto link syntax
* Fix typos and add blurbs
* Add fixes to Triton optimization doc
* Tabify saving adapters / models section
* Fix linting errors - spellcheck
Fix spelling and grammar
Satisfy linter
Update wording in profiling guide
Add fixes to satisfy linter
More fixes for linting in Triton guide
More linting fixes
Spellcheck in CK guide
* Improve triton guide
Fix linting errors and optics
* Add occupancy / vgpr table
Change some wording
* Re-add tunableop
* Add missing indent in _toc.yml
* Remove ckProfiler references
* Add links to resources
* Add refs in CK optimization guide
* Rename files and fix internal links
* Organize tuning guides
Reorg triton
* Add compute unit diagram
* Remove AutoAWQ
* Add higher res image for Perfetto trace example
* Update link text
* Update fig nums
* Update some formatting
* Update "Inductor"
* Change "Inductor" to TorchInductor
* Add link to official TorchInductor docs
* Add Fine Tuning LLMs how to guide
* Reorg and refactor Fine-tuning LLMs with ROCm
Update index and headings
Fix formatting and update toc
Split out content from index to overview.rst
Add metadata
Clean up overview
Add inference sections, fix rst errors, clean up single-gpu-fine-tuning
Combine fine-tuning and inference guides
Fix some links and formatting
Update toc and add formatting fixes
Add ck kernel fusion content
Update toc
Clean up model quantization and acceleration
Add CK images
Clean up profiling
Update triton kernel performance optimization
Update llm inference frameworks guide
Disable automatic number of figures and tables in Sphinx conf
Change tabs to spaces
Change heading to end with -ing
Add link fixes and heading updates
Add rocprof/Omniperf/Omnitrace section
Update profiling and debugging guide
Add formatting fixes
Satisfy spellcheck
Fix words
Delete unused file
Finish overview
Clean up first 4 sections
Multi-gpu fine-tuning guide: slight fixes
Update toc
Remove tabs
Formatting fixes
* Minor wording updates
* Add some clean-up
* Update profiling and debugging gudie
* Fix Omnitrace link
* Update ck kernel fusion with latest
* Update CK formatting
* Fix perfetto link syntax
* Fix typos and add blurbs
* Add fixes to Triton optimization doc
* Tabify saving adapters / models section
* Fix linting errors - spellcheck
Fix spelling and grammar
Satisfy linter
Update wording in profiling guide
Add fixes to satisfy linter
More fixes for linting in Triton guide
More linting fixes
Spellcheck in CK guide
* Improve triton guide
Fix linting errors and optics
* Add occupancy / vgpr table
Change some wording
* Re-add tunableop
* Add missing indent in _toc.yml
* Remove ckProfiler references
* Add links to resources
* Add refs in CK optimization guide
* Rename files and fix internal links
* Organize tuning guides
Reorg triton
* Add compute unit diagram
* Remove AutoAWQ
* Add higher res image for Perfetto trace example
* Update link text
* Update fig nums
* Update some formatting
* Update "Inductor"
* Change "Inductor" to TorchInductor
* Add link to official TorchInductor docs
* Regenerate changelog
* Add component changelogs and known issue
Fix RELEASE.md headings
Update pub datestamp for 6.1.2
Add AMDSMI and ROCm SMI to 6.1.2 template
Add rccl and rocBLAS
Update intro blurb and headings
Add ROCm SMI fix
Add missed heading to AMDSMI
Update datestamp and release version number
Update version and release number
Add known issue re: MI300X error detection
Words
Add issue link
Rm GitHub issue link
Move known issue down
Update ki wording
Remove "this issue has been investigated ... " from known issue
Fix changelog h1
Template with bash commands to update cmake with snap.
Use template for two components that want updated cmake with latest source on their default branches.
* Add Using ROCm for AI:wq
Add PyTorch Docker installation images
Split doc into subtopics
Add metadata
Clean up index
Clean up hugging face guide
Clean up installation guide
Fix rST formatting
Clean up install and train-a-model
Clean up MAD
Delete unused file
Add ref anchors and clean up MAD doc
Add formatting fixes
Update toc and section index
Format some code blocks
Remove install guide and update toc
Chop installation guide
Clean up deployment and hugging face sections
Change headings to end in -ing
Fix spelling in Training a model
Delete MAD and split out install content
Fix formatting
Change words to satisfy spellcheck linter
* Add review suggestions and add helpful links
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Add helpful links and add review suggestions
Remove fine-tuning link and links to D5 and MAGMA
Update docs/how-to/rocm-for-ai/deploy-your-model.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Update DeepSpeed link
Add subheading to ML framework installation and closing blurb to hugging face models guide
* Reorder topics
* Add Using ROCm for AI:wq
Add PyTorch Docker installation images
Split doc into subtopics
Add metadata
Clean up index
Clean up hugging face guide
Clean up installation guide
Fix rST formatting
Clean up install and train-a-model
Clean up MAD
Delete unused file
Add ref anchors and clean up MAD doc
Add formatting fixes
Update toc and section index
Format some code blocks
Remove install guide and update toc
Chop installation guide
Clean up deployment and hugging face sections
Change headings to end in -ing
Fix spelling in Training a model
Delete MAD and split out install content
Fix formatting
Change words to satisfy spellcheck linter
* Add review suggestions and add helpful links
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Add helpful links and add review suggestions
Remove fine-tuning link and links to D5 and MAGMA
Update docs/how-to/rocm-for-ai/deploy-your-model.rst
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Update DeepSpeed link
Add subheading to ML framework installation and closing blurb to hugging face models guide
* Reorder topics
* removed docker and pointed ROCm deps to our existing builds
* removed vmImage tag for pool
* added pip to apt list and renamed from rocFFT to hipFFT
* fixed spelling mistakes in rocmDependencies
* added correct apt dep for pip
* removed leading slash in the cmake flags
* changed cxx_compiler to /rocm/bin/hipcc
* added llvm-project, ROCR-Runtime, clr, and rocminfo to rocm deps
* added rocFFT as a rocm dependency
* removed docker and added our builds for components
* removed rocFFT from rocm deps
* Fixed typo in rocFFT value
* added rocprofiler-register to rocFFT and fixed typo in the dependencies-rocm file
* changed cxx compiler to amdclang++
* fixed amdclang++ paths
* moving to faster machine
* added cmake module paths
* switched back to medium build
* added libopm-dev to apt deps
* added libomp-14-dev to apt deps
* added aomp as a rocm dep
* added aomp as a rocm dep
* added hipcc as the cxx_compiler
* reverted back to clang++ as the cxx_compiler
* removed unmentioned rocm deps from the readme
* removed docker
* added python3-pip as an apt dep
* fixed compiler paths
* added hipRAND as a rocm dep
* added print statements to see directory structure
* adding a print statement into /agent/_work/1/s/build/library
* added -Tensile_rocm_assembler as a build flag
* removed a broken script line
* added D to tensile rocm assembler
* added DROCM_PATH to build flags
* fixed typo
* changed build pool from medium to base
* changed build pool from base to low
* added env variables using josephs pr
* removed docker from hipBLASLt and added rocm dependencies that point to our builds
* added pip to the apt packages array
* changed cmake_cxx_compiler env var ro amdclang++
* changed cmake_cxx_compiler env var to amdclang++
* changed cmake_cxx_compiler env var to hipcc
* changed cmake_cxx_compiler env var to hipcc
* changed clang to amdclang
* changed all refs mentioning hipcc to amdclang
* changed cmake_cxx_compiler back to hipcc
* added a HIP_PATH env var based off Tensile/Source/FindHIP.cmake
* added hipcc to HIP_PATH
* added rocm-cmake to rocm deps
* added rocRAND as a rocm dep
* removed dcmake_module flag
* added libomp-dev as an apt dep
* added aomp as a rocm dep
* added clang as an apt dep
* reverted changes back to how they appear in develop since this branch will be submitted for review
* removed unecessary flags
* adding -DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang back to see if these are vital to a successful build
* removed newline character
* Disable aomp offload build for initial ci-build work
* Missing dependency for medium pool use of rocPRIM
* Latest rocBLAS source needs added ROCm dependencies
* Rename 'Tuning guides' to 'Hardware optimization'
* Move deep learning to Install section
* Change 'Hardware' to 'System' to align with index.md
* Satisfy spellcheck linter
* adding new framework install graphic with JAX
* Fix link to ROCm libraries list
* crop framework_install graphic
* Reset .wordlist.txt update
* Prettify deep learning framework installation page
* Change spacing in list of frameworks
---------
Co-authored-by: Young Hui <young.hui@amd.com>
* Rename 'Tuning guides' to 'Hardware optimization'
* Move deep learning to Install section
* Change 'Hardware' to 'System' to align with index.md
* Satisfy spellcheck linter
* adding new framework install graphic with JAX
* Fix link to ROCm libraries list
* crop framework_install graphic
* Reset .wordlist.txt update
* Prettify deep learning framework installation page
* Change spacing in list of frameworks
---------
Co-authored-by: Young Hui <young.hui@amd.com>
aomp build is not triggered by changes to aomp repo, but by updates to llvm-project and ROCR-Runtime, so trigger definition can remain this ROCm/ROCm repo.
Instead of using docker and apt install of ROCm component dependencies, use tarballs from Azure Pipeline builds to enable updates of ROCm interdependencies without waiting for releases..
* Update External CI Interdependencies for more repos
- composable_kernel
- hipBLAS
- rocBLAS
- rocSOLVER
Cleaned up unused flags from llvm-project
* Remove LD_LIBRARY_PATH change. Should not be needed.
- Fixed compilers to pick amdclang.
- Added ldconfig step for setting up linking of shared libraries.
- Set Azure VMs to medium only.
- Remove empty directories in published tarballs.
After examining the build products of recent builds and consuming them for other components, observed some additional flags should be added. Used rocm-build repo for reference.
Move HIPIFY from 6.1.1.md to 6.1.2.md
Regenerate changelog
Fix accidental autoformat in 6.1.1.md
Update 6.1.2.md and regen changelog
Add AMD SMI for ROCm 6.1.2
Regen changelog
Add rocDecode and update RELEASE.md
Update 6.1.2 intro blurb
Fix arrow symbol
Add (tm) to changelog.jinja template
Incorporate Leo's feedback
Intro blurb wording.
Add missed tested ROCm config (rocDecode)
Add OS support
Add version to release notes h1
Update intro blurb again
Make changelog filepath lowercase
Update blurb
Add extra line to 6.1.2 template
Fix heading in RELEASE
Fix amdsmi changelog link
Remove OS support notice
Add rocDecode to table
Add redecode to CL
Update rocDecode setup script note for clarity
Update AMD SMI changelog
Apply Leo's feedback
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
To best determine hardware specs per repo, added more build pool options with varying number of vCPUs, RAM size, etc. and will kick off builds with test targets enabled to determine long-term cost values.
Co-authored-by: alexxu-amd <alexxu12@amd.com>
* Update Ubuntu kernel versions for 6.1.1 changelog and release notes
* Add link to GitHub issue for ROCm SMI in changelog and RN
* Fix ROCm SMI GH issue link
* Update kernel versions format
* Update kernel version format for readability
* Update kernel version brackets
-Updating build flags for llvm-project to support another pipeline to work with aomp repos.
-Added support for rocMLIR component.
-Removed MIVisionX python dependency script and leveraged existing dependencies template.
-Change to use cloud systems
* Add ROCm version 6.1.0 to version list (#3023)
* Update CHANGELOG.md
Added GitHub links to Changelog
* Update CHANGELOG.md
* Update manifest for ROCm 6.1.0 (#3022)
* Reorganize default.xml by group and alphabetically
* Add rocDecode to default.xml
* Add rocDecode to included names in tag script
* update tag to 6.1.0
---------
* Update CHANGELOG.md
Updated ROCm Compiler with fixed issue
* docs(tools/autotag/README.md): Add additional note to avoid duplicating data in changelog template (#3018)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
* Use Ubuntu 22.04 and Python 3.10 in RTD config
* Update README.md (#3043)
* Update README.md
Fix rocSPARSE build link
* Update link to just general page, instead of anchor
* Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md
* Reorder third-party libraries installation guides in index
* Update links to rocAL component (#3033)
* Update links to rocAL component
* Change absolute rocm docs links to relative
* Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Update excluded and included projects
* Separate templates into a module; Fix MIVisionX template
* Add hipfort changelog processor
* Add rpp custom processor
* Add custom processor for rvs
* update the code-owner list (#3046)
* Update default.xml (#3038)
* Remove HIPCC from default.xml
HIPCC moved into llvm-project
* Remove ROCm-Device-Libs from default.xml
ROCm-Device-Libs was moved into llvm-project
* Remove ROCm-CompilerSupport from default.xml
ROCm-CompilerSupport was moved into llvm-project
* Add rocprofiler-register to default.xml
Added in 6.1 manifest
* Apply mathlibs group to projects in manifest
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx (#3047)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
* Set Ubuntu 22.04 and Python 3.10 in ReadtheDocs config
---------
* Add 6.1.0.md template
* Add AMD SMI to 6.1.0 template
* Add ROCm Compiler to 6.1.0 template
* Add RDC to 6.1.0 template
* Add ROCgdb to 6.1.0 template
* Add ROCm SMI to 6.1.0 template
* Add ROCProfiler to 6.1.0 template
* Add MI200 SR-IOV known issue to 6.1.0 template
* Add MI300 RAS fixed defect to 6.1.0 template
* docs(6.1.0.md): Add more changelog notes for 6.1.0
* Update 6.1.0.md
Added links to GitHub for known issues and ROCm Compiler fixed defect
* Test autotag script
* Add ck template
* Add HIPIFY to included names for tag script
* Remove rocprofiler from tag_script
* Remove RVS template
Determine cause of missing later
* Add HIPIFY to template for 6.1.0
* Add extra line to topp of template for formatting changelog
* Update 5.7.1.md
Fixing the broken link for rocBLAS programmer's guide in 5.7.1 Changelog.
* Regenerate changelog with new 5.7.1 link fix
* Add note for tag_script included_names
* Improve readability of GPU architecture hardware specs (#3009)
* move units of measurement to table headers
* add glossary explaining table headers
* add missed units and update h1
* toc listing to say indicate Accelerators & GPUs
* fix typo
* update meta description and keywords
* Update title in toc to fit in sidebar
* update title, toc, and filename
* Fix broken link to HIP programming guide
* Revert "update title, toc, and filename"
This reverts commit 6b9e687805.
* Revert glossary; slight fixes
* Change 'Pro' to 'PRO' for consistency
* Add references to programming and hardware architecture guides
* Change 'warp' to 'wavefront'
* Update changelog.jinja to exclude version number in header for lindividual libraries (#3058)
* Base set of Azure DevOps pipeline library source (#3021)
* Base set of Azure DevOps pipeline library source
A base set of yaml files to orchestrate the build and testing of ROCm compiler and runtime components in an Azure DevOps project.
* Use hipcc in llvm-project, also build OpenCL runtime.
* Adding llvm-lit tests to llvm-project pipeline.
Added comgr ctest as well.
* rocm-cmake unit testing in pipeline
* Pipeline changes corresponding to 6.1 release
* Bump rocm-docs-core from 1.0.0 to 1.1.0 in /docs/sphinx (#3063)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.0.0...v1.1.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
* Bump rocm-docs-core from 1.0.0 to 1.1.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.0.0...v1.1.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
* update the default.xml for ROCm6.1 (#3067)
* Bump urllib3 from 1.26.13 to 1.26.18 in /docs/sphinx (#3068)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.13 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.13...1.26.18)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: indirect
...
* Add 6.1.1.md template
* Bump rocm-docs-core from 1.1.0 to 1.1.1 in /docs/sphinx (#3070)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.0...v1.1.1)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
* Fix broken link on hardware specs page (#3075)
* Fix broken link
Fix broken link on hardware specs page to HIP programming model due to
refactoring of HIP docs.
* Update link anchor
* Tagged builds of External CI components (#3078)
* Tagged builds of External CI components
Adding capability to kick off builds of ROCm components based on a tag ref, without the need of the yaml file in the corresponding repo that is used for pre-submit and on-submit builds. This unblocks the team from creating an initial set of pipelines to verify things work.
Also made some improvements to the code structure and added support for more repos.
---------
* More external CI pipelines (#3083)
Changing default behaviour for PRs with tag-builds.
Changing build system for some jobs based on execution time.
* Add compatibility matrix (#3082)
* add compatibility matrix and custom css
* fix toc
* reorder some components in matrix, add missing tools to reference page
* Update docs/compatibility/compatibility-matrix.rst
---------
* update OS strings to be more readable and searchable (#3088)
* Tag build pipelines for four more ROCm repos (#3085)
-rocgdb
-hipother via HIP build with targeted platform
-hipSOLVER
-hipSPARSELt
* Bump jinja2 from 3.1.3 to 3.1.4 in /docs/sphinx (#3089)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4)
---
updated-dependencies:
- dependency-name: jinja2
dependency-type: indirect
...
* Compatibility Matrix - include AMDSMI (#3090)
* Extend codeowners for docs (#3091)
* Add release notes
Improve wording
Clarify Ubuntu 22.04.5 is pre-release
Add AMD SMI changes
Fix md headings and some words
Reword highlight
Add feedback from Leo to release highlight
Add generated changelog
Add RELEASE.md for 6.1.1
Update highlight in RELEASE.md with change in 6.1.1 template
Change h1 in CHANGELOG.md
to ROCm 6.1.1 changelog
Change release notes to changelog in CHANGELOG.md
Fix missing info in CHANGELOG.md pre-6.1.1
Add HIPIFY 6.1.1 to changelog
Add HIPIFY to RELEASE.md
Also fix typo in changelog
Add HIPIFY to 6.1.1 template
* Fix util imports
* Skip and log missing branches for release_data.py
* Update autotag readme
* Remove ck template
* Fix changelog and release notes
Add \n to top of 6.0.2 template
Update RELEASE.md and 6.1.1.md
Regenerate changelog
Add minor wording changes in RELEASE.md
Incorporate Leo's feedback
Reformat RELEASE.md to fix build issue
Fixes an issue preventing Changelog from appearing in the TOC.
Update AMDSMI link & change 'release highlights' to 'release notes'
Change AMD SMI link from develop to docs/6.1.1
* Bump rocm-docs-core from 1.1.0 to 1.1.1 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.0...v1.1.1)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
* Update changelog and release notes for 6.1.1
Reformat 6.1.0 to 6.0.0 changelog
Add ROCm SMI known issues to RN
Tweak ROCm SMI wording
Add known issue
Reword known issue rn
Fix headings and wording
Remove redundancy
Fix headings and known issue words
Leo changes
Remove known issue with Radeon GPUs
Specify Navi3 GPUs in ROCM SMI known issue
Change Navi 3x to RDNA3
Add OS support note
Fix 6.1.1 template link to amdsmi
Update 6.1.1 library table, add hipBLASLt to 6.1.1 CL/RN, update HIPCC upcoming changes wording
Remove extra bullet
Change gpu to GPU in rocFFT
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Roopa Malavally <56051583+Rmalavally@users.noreply.github.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: amitkumar-amd <120512306+amitkumar-amd@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* Add ROCm version 6.1.0 to version list (#3023)
* Update CHANGELOG.md
Added GitHub links to Changelog
* Update CHANGELOG.md
* Update manifest for ROCm 6.1.0 (#3022)
* Reorganize default.xml by group and alphabetically
* Add rocDecode to default.xml
* Add rocDecode to included names in tag script
* update tag to 6.1.0
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* Update CHANGELOG.md
Updated ROCm Compiler with fixed issue
* docs(tools/autotag/README.md): Add additional note to avoid duplicating data in changelog template (#3018)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Use Ubuntu 22.04 and Python 3.10 in RTD config
* Update README.md (#3043)
* Update README.md
Fix rocSPARSE build link
* Update link to just general page, instead of anchor
* Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md
* Reorder third-party libraries installation guides in index
* Update links to rocAL component (#3033)
* Update links to rocAL component
* Change absolute rocm docs links to relative
* Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Update excluded and included projects
* Separate templates into a module; Fix MIVisionX template
* Add hipfort changelog processor
* Add rpp custom processor
* Add custom processor for rvs
* update the code-owner list (#3046)
* Update default.xml (#3038)
* Remove HIPCC from default.xml
HIPCC moved into llvm-project
* Remove ROCm-Device-Libs from default.xml
ROCm-Device-Libs was moved into llvm-project
* Remove ROCm-CompilerSupport from default.xml
ROCm-CompilerSupport was moved into llvm-project
* Add rocprofiler-register to default.xml
Added in 6.1 manifest
* Apply mathlibs group to projects in manifest
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx (#3047)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Set Ubuntu 22.04 and Python 3.10 in ReadtheDocs config
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Add 6.1.0.md template
* Add AMD SMI to 6.1.0 template
* Add ROCm Compiler to 6.1.0 template
* Add RDC to 6.1.0 template
* Add ROCgdb to 6.1.0 template
* Add ROCm SMI to 6.1.0 template
* Add ROCProfiler to 6.1.0 template
* Add MI200 SR-IOV known issue to 6.1.0 template
* Add MI300 RAS fixed defect to 6.1.0 template
* docs(6.1.0.md): Add more changelog notes for 6.1.0
* Update 6.1.0.md
Added links to GitHub for known issues and ROCm Compiler fixed defect
* Test autotag script
* Add ck template
* Add HIPIFY to included names for tag script
* Remove rocprofiler from tag_script
* Remove RVS template
Determine cause of missing later
* Add HIPIFY to template for 6.1.0
* Add extra line to topp of template for formatting changelog
* Update 5.7.1.md
Fixing the broken link for rocBLAS programmer's guide in 5.7.1 Changelog.
* Regenerate changelog with new 5.7.1 link fix
* Add note for tag_script included_names
* Improve readability of GPU architecture hardware specs (#3009)
* move units of measurement to table headers
* add glossary explaining table headers
* add missed units and update h1
* toc listing to say indicate Accelerators & GPUs
* fix typo
* update meta description and keywords
* Update title in toc to fit in sidebar
* update title, toc, and filename
* Fix broken link to HIP programming guide
* Revert "update title, toc, and filename"
This reverts commit 6b9e687805.
* Revert glossary; slight fixes
* Change 'Pro' to 'PRO' for consistency
* Add references to programming and hardware architecture guides
* Change 'warp' to 'wavefront'
* Update changelog.jinja to exclude version number in header for lindividual libraries (#3058)
* Base set of Azure DevOps pipeline library source (#3021)
* Base set of Azure DevOps pipeline library source
A base set of yaml files to orchestrate the build and testing of ROCm compiler and runtime components in an Azure DevOps project.
* Use hipcc in llvm-project, also build OpenCL runtime.
* Adding llvm-lit tests to llvm-project pipeline.
Added comgr ctest as well.
* rocm-cmake unit testing in pipeline
* Pipeline changes corresponding to 6.1 release
* Bump rocm-docs-core from 1.0.0 to 1.1.0 in /docs/sphinx (#3063)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.0.0...v1.1.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump rocm-docs-core from 1.0.0 to 1.1.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.0.0...v1.1.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* update the default.xml for ROCm6.1 (#3067)
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
* Bump urllib3 from 1.26.13 to 1.26.18 in /docs/sphinx (#3068)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.13 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.13...1.26.18)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add 6.1.1.md template
* Bump rocm-docs-core from 1.1.0 to 1.1.1 in /docs/sphinx (#3070)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.0...v1.1.1)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix broken link on hardware specs page (#3075)
* Fix broken link
Fix broken link on hardware specs page to HIP programming model due to
refactoring of HIP docs.
* Update link anchor
* Tagged builds of External CI components (#3078)
* Tagged builds of External CI components
Adding capability to kick off builds of ROCm components based on a tag ref, without the need of the yaml file in the corresponding repo that is used for pre-submit and on-submit builds. This unblocks the team from creating an initial set of pipelines to verify things work.
Also made some improvements to the code structure and added support for more repos.
---------
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
* More external CI pipelines (#3083)
Changing default behaviour for PRs with tag-builds.
Changing build system for some jobs based on execution time.
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
* Add compatibility matrix (#3082)
* add compatibility matrix and custom css
* fix toc
* reorder some components in matrix, add missing tools to reference page
* Update docs/compatibility/compatibility-matrix.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
---------
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* update OS strings to be more readable and searchable (#3088)
* Tag build pipelines for four more ROCm repos (#3085)
-rocgdb
-hipother via HIP build with targeted platform
-hipSOLVER
-hipSPARSELt
* Bump jinja2 from 3.1.3 to 3.1.4 in /docs/sphinx (#3089)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4)
---
updated-dependencies:
- dependency-name: jinja2
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Compatibility Matrix - include AMDSMI (#3090)
* Extend codeowners for docs (#3091)
* Add release notes
Improve wording
Clarify Ubuntu 22.04.5 is pre-release
Add AMD SMI changes
Fix md headings and some words
Reword highlight
Add feedback from Leo to release highlight
Add generated changelog
Add RELEASE.md for 6.1.1
Update highlight in RELEASE.md with change in 6.1.1 template
Change h1 in CHANGELOG.md
to ROCm 6.1.1 changelog
Change release notes to changelog in CHANGELOG.md
Fix missing info in CHANGELOG.md pre-6.1.1
Add HIPIFY 6.1.1 to changelog
Add HIPIFY to RELEASE.md
Also fix typo in changelog
Add HIPIFY to 6.1.1 template
* Fix util imports
* Skip and log missing branches for release_data.py
* Update autotag readme
* Remove ck template
* Fix changelog and release notes
Add \n to top of 6.0.2 template
Update RELEASE.md and 6.1.1.md
Regenerate changelog
Add minor wording changes in RELEASE.md
Incorporate Leo's feedback
Reformat RELEASE.md to fix build issue
Fixes an issue preventing Changelog from appearing in the TOC.
Update AMDSMI link & change 'release highlights' to 'release notes'
Change AMD SMI link from develop to docs/6.1.1
* Bump rocm-docs-core from 1.1.0 to 1.1.1 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v1.1.0...v1.1.1)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
* Update changelog and release notes for 6.1.1
Reformat 6.1.0 to 6.0.0 changelog
Add ROCm SMI known issues to RN
Tweak ROCm SMI wording
Add known issue
Reword known issue rn
Fix headings and wording
Remove redundancy
Fix headings and known issue words
Leo changes
Remove known issue with Radeon GPUs
Specify Navi3 GPUs in ROCM SMI known issue
Change Navi 3x to RDNA3
Add OS support note
Fix 6.1.1 template link to amdsmi
Update 6.1.1 library table, add hipBLASLt to 6.1.1 CL/RN, update HIPCC upcoming changes wording
Remove extra bullet
Change gpu to GPU in rocFFT
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Roopa Malavally <56051583+Rmalavally@users.noreply.github.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: amitkumar-amd <120512306+amitkumar-amd@users.noreply.github.com>
Co-authored-by: Joseph Macaranas <145489236+amd-jmacaran@users.noreply.github.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Reformat 6.1.0 to 6.0.0 changelog
Add ROCm SMI known issues to RN
Tweak ROCm SMI wording
Add known issue
Reword known issue rn
Fix headings and wording
Remove redundancy
Fix headings and known issue words
Leo changes
Remove known issue with Radeon GPUs
Specify Navi3 GPUs in ROCM SMI known issue
Change Navi 3x to RDNA3
Add OS support note
Fix 6.1.1 template link to amdsmi
Update 6.1.1 library table, add hipBLASLt to 6.1.1 CL/RN, update HIPCC upcoming changes wording
Remove extra bullet
Change gpu to GPU in rocFFT
Add \n to top of 6.0.2 template
Update RELEASE.md and 6.1.1.md
Regenerate changelog
Add minor wording changes in RELEASE.md
Incorporate Leo's feedback
Reformat RELEASE.md to fix build issue
Fixes an issue preventing Changelog from appearing in the TOC.
Update AMDSMI link & change 'release highlights' to 'release notes'
Change AMD SMI link from develop to docs/6.1.1
Improve wording
Clarify Ubuntu 22.04.5 is pre-release
Add AMD SMI changes
Fix md headings and some words
Reword highlight
Add feedback from Leo to release highlight
Add generated changelog
Add RELEASE.md for 6.1.1
Update highlight in RELEASE.md with change in 6.1.1 template
Change h1 in CHANGELOG.md
to ROCm 6.1.1 changelog
Change release notes to changelog in CHANGELOG.md
Fix missing info in CHANGELOG.md pre-6.1.1
Add HIPIFY 6.1.1 to changelog
Add HIPIFY to RELEASE.md
Also fix typo in changelog
Add HIPIFY to 6.1.1 template
Changing default behaviour for PRs with tag-builds.
Changing build system for some jobs based on execution time.
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
* Tagged builds of External CI components
Adding capability to kick off builds of ROCm components based on a tag ref, without the need of the yaml file in the corresponding repo that is used for pre-submit and on-submit builds. This unblocks the team from creating an initial set of pipelines to verify things work.
Also made some improvements to the code structure and added support for more repos.
---------
Co-authored-by: abhimeda <abhinav.meda@amd.com>
Co-authored-by: alexxu-amd <alex.xu@amd.com>
* Base set of Azure DevOps pipeline library source
A base set of yaml files to orchestrate the build and testing of ROCm compiler and runtime components in an Azure DevOps project.
* Use hipcc in llvm-project, also build OpenCL runtime.
* Adding llvm-lit tests to llvm-project pipeline.
Added comgr ctest as well.
* rocm-cmake unit testing in pipeline
* Pipeline changes corresponding to 6.1 release
* update manifest file for ROCm6.1 (#3024)
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
* Add ROCm version 6.1.0 to version list (#3023) (#3025)
* Merge develop into roc-6.1.x (#3048)
* Add ROCm version 6.1.0 to version list (#3023)
* Update CHANGELOG.md
Added GitHub links to Changelog
* Update CHANGELOG.md
* Update manifest for ROCm 6.1.0 (#3022)
* Reorganize default.xml by group and alphabetically
* Add rocDecode to default.xml
* Add rocDecode to included names in tag script
* update tag to 6.1.0
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* Update CHANGELOG.md
Updated ROCm Compiler with fixed issue
* docs(tools/autotag/README.md): Add additional note to avoid duplicating data in changelog template (#3018)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Use Ubuntu 22.04 and Python 3.10 in RTD config
* Update README.md (#3043)
* Update README.md
Fix rocSPARSE build link
* Update link to just general page, instead of anchor
* Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md
* Reorder third-party libraries installation guides in index
* Update links to rocAL component (#3033)
* Update links to rocAL component
* Change absolute rocm docs links to relative
* Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Update excluded and included projects
* Separate templates into a module; Fix MIVisionX template
* Add hipfort changelog processor
* Add rpp custom processor
* Add custom processor for rvs
* update the code-owner list (#3046)
* Update default.xml (#3038)
* Remove HIPCC from default.xml
HIPCC moved into llvm-project
* Remove ROCm-Device-Libs from default.xml
ROCm-Device-Libs was moved into llvm-project
* Remove ROCm-CompilerSupport from default.xml
ROCm-CompilerSupport was moved into llvm-project
* Add rocprofiler-register to default.xml
Added in 6.1 manifest
* Apply mathlibs group to projects in manifest
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx (#3047)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Set Ubuntu 22.04 and Python 3.10 in ReadtheDocs config
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Add 6.1.0.md template
* Add AMD SMI to 6.1.0 template
* Add ROCm Compiler to 6.1.0 template
* Add RDC to 6.1.0 template
* Add ROCgdb to 6.1.0 template
* Add ROCm SMI to 6.1.0 template
* Add ROCProfiler to 6.1.0 template
* Add MI200 SR-IOV known issue to 6.1.0 template
* Add MI300 RAS fixed defect to 6.1.0 template
* docs(6.1.0.md): Add more changelog notes for 6.1.0
* Update 6.1.0.md
Added links to GitHub for known issues and ROCm Compiler fixed defect
* Test autotag script
* Add ck template
* Add HIPIFY to included names for tag script
* Remove rocprofiler from tag_script
* Remove RVS template
Determine cause of missing later
* Add HIPIFY to template for 6.1.0
* Add extra line to topp of template for formatting changelog
* Update 5.7.1.md
Fixing the broken link for rocBLAS programmer's guide in 5.7.1 Changelog.
* Regenerate changelog with new 5.7.1 link fix
* Add note for tag_script included_names
* Improve readability of GPU architecture hardware specs (#3009)
* move units of measurement to table headers
* add glossary explaining table headers
* add missed units and update h1
* toc listing to say indicate Accelerators & GPUs
* fix typo
* update meta description and keywords
* Update title in toc to fit in sidebar
* update title, toc, and filename
* Fix broken link to HIP programming guide
* Revert "update title, toc, and filename"
This reverts commit 6b9e687805.
* Revert glossary; slight fixes
* Change 'Pro' to 'PRO' for consistency
* Add references to programming and hardware architecture guides
* Change 'warp' to 'wavefront'
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Roopa Malavally <56051583+Rmalavally@users.noreply.github.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: amitkumar-amd <120512306+amitkumar-amd@users.noreply.github.com>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Yanyao Wang <yanywang@amd.com>
Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>
Co-authored-by: Roopa Malavally <56051583+Rmalavally@users.noreply.github.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: amitkumar-amd <120512306+amitkumar-amd@users.noreply.github.com>
* Add ROCm version 6.1.0 to version list (#3023)
* Update CHANGELOG.md
Added GitHub links to Changelog
* Update CHANGELOG.md
* Update manifest for ROCm 6.1.0 (#3022)
* Reorganize default.xml by group and alphabetically
* Add rocDecode to default.xml
* Add rocDecode to included names in tag script
* update tag to 6.1.0
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* Update CHANGELOG.md
Updated ROCm Compiler with fixed issue
* docs(tools/autotag/README.md): Add additional note to avoid duplicating data in changelog template (#3018)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Use Ubuntu 22.04 and Python 3.10 in RTD config
* Update README.md (#3043)
* Update README.md
Fix rocSPARSE build link
* Update link to just general page, instead of anchor
* Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md
* Reorder third-party libraries installation guides in index
* Update links to rocAL component (#3033)
* Update links to rocAL component
* Change absolute rocm docs links to relative
* Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Update excluded and included projects
* Separate templates into a module; Fix MIVisionX template
* Add hipfort changelog processor
* Add rpp custom processor
* Add custom processor for rvs
* update the code-owner list (#3046)
* Update default.xml (#3038)
* Remove HIPCC from default.xml
HIPCC moved into llvm-project
* Remove ROCm-Device-Libs from default.xml
ROCm-Device-Libs was moved into llvm-project
* Remove ROCm-CompilerSupport from default.xml
ROCm-CompilerSupport was moved into llvm-project
* Add rocprofiler-register to default.xml
Added in 6.1 manifest
* Apply mathlibs group to projects in manifest
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx (#3047)
* Bump rocm-docs-core from 0.38.1 to 1.0.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.1 to 1.0.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.1...v1.0.0)
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* Set Ubuntu 22.04 and Python 3.10 in ReadtheDocs config
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Add 6.1.0.md template
* Add AMD SMI to 6.1.0 template
* Add ROCm Compiler to 6.1.0 template
* Add RDC to 6.1.0 template
* Add ROCgdb to 6.1.0 template
* Add ROCm SMI to 6.1.0 template
* Add ROCProfiler to 6.1.0 template
* Add MI200 SR-IOV known issue to 6.1.0 template
* Add MI300 RAS fixed defect to 6.1.0 template
* docs(6.1.0.md): Add more changelog notes for 6.1.0
* Update 6.1.0.md
Added links to GitHub for known issues and ROCm Compiler fixed defect
* Test autotag script
* Add ck template
* Add HIPIFY to included names for tag script
* Remove rocprofiler from tag_script
* Remove RVS template
Determine cause of missing later
* Add HIPIFY to template for 6.1.0
* Add extra line to topp of template for formatting changelog
* Update 5.7.1.md
Fixing the broken link for rocBLAS programmer's guide in 5.7.1 Changelog.
* Regenerate changelog with new 5.7.1 link fix
* Add note for tag_script included_names
* Improve readability of GPU architecture hardware specs (#3009)
* move units of measurement to table headers
* add glossary explaining table headers
* add missed units and update h1
* toc listing to say indicate Accelerators & GPUs
* fix typo
* update meta description and keywords
* Update title in toc to fit in sidebar
* update title, toc, and filename
* Fix broken link to HIP programming guide
* Revert "update title, toc, and filename"
This reverts commit 6b9e687805.
* Revert glossary; slight fixes
* Change 'Pro' to 'PRO' for consistency
* Add references to programming and hardware architecture guides
* Change 'warp' to 'wavefront'
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Roopa Malavally <56051583+Rmalavally@users.noreply.github.com>
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: peter <peter.park@amd.com>
Co-authored-by: amitkumar-amd <120512306+amitkumar-amd@users.noreply.github.com>
* move units of measurement to table headers
* add glossary explaining table headers
* add missed units and update h1
* toc listing to say indicate Accelerators & GPUs
* fix typo
* update meta description and keywords
* Update title in toc to fit in sidebar
* update title, toc, and filename
* Fix broken link to HIP programming guide
* Revert "update title, toc, and filename"
This reverts commit 6b9e687805.
* Revert glossary; slight fixes
* Change 'Pro' to 'PRO' for consistency
* Add references to programming and hardware architecture guides
* Change 'warp' to 'wavefront'
* Remove HIPCC from default.xml
HIPCC moved into llvm-project
* Remove ROCm-Device-Libs from default.xml
ROCm-Device-Libs was moved into llvm-project
* Remove ROCm-CompilerSupport from default.xml
ROCm-CompilerSupport was moved into llvm-project
* Add rocprofiler-register to default.xml
Added in 6.1 manifest
* Apply mathlibs group to projects in manifest
* Update compatibility/precision-support links (#3030)
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Update links to rocAL component (#3033)
* Update links to rocAL component
* Change absolute rocm docs links to relative
* Add 'JAX for ROCm' link to index.md (#3034)
* Add JAX for ROCm link to index.md
* Reorder third-party libraries installation guides in index
* Update README.md (#3043)
* Update README.md
Fix rocSPARSE build link
* Update link to just general page, instead of anchor
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
* Change links to component data type support pages from absolute to relative
* Fix rocPRIM data type support links
* Empty commit to trigger demo rebuild.
* Reorganize default.xml by group and alphabetically
* Add rocDecode to default.xml
* Add rocDecode to included names in tag script
* update tag to 6.1.0
---------
Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com>
Added known issue for ROCm compiler
https://ontrack-internal.amd.com/browse/SWDEV-454778
Added known issue for RVS
Added known issue for MI200 SRIOV
Updated PEBB test known issue for RVS
Added expansion for PEBB
Added PBQT known issue
expanded P2P Benchmark and Qualification Tool
Edited RVS known issue description based on Leo's input
Added MI300A fixed defect
Removed PEBB and Babel Stream from RVS known issue
Updated RCCL
Added rocm-cmake
Added rocRAND
Added rocWMMA
Added Tensile
Alan's change 1
Alan change to HIPIFY
Alan's edit 3 for MIOpen
OpenMP 2nd bullet fix - Alan edit
Alan's edit - ROCm Compiler
ROCm Validation Suite edits
Alan's edit rocSOLVER
Alan's edit to ROCTracer
Updated hipSPARSELt
Added hipTensor 1.2.0
Added hipTensor
data type correction
updated the RCCL version
Added bullets to known issues for consistency
Changed RAS to Fixed defect
* Add rocDecode to What is ROCm? components list (#3016)
* Add rocDecode to What is ROCm? components list
* Fix typo -> 'Common Language Runtime'
* Change 'compute' to 'common'
* Add rocDecode to API libraries (#3019)
* Update links
* table cleanup
* cross-refs
* wordlist update
* add temp hard links
* verbiage
* docs(index.md): Disable MD051 for Sphinx Markdown anchor point
In general this rule should be followed to avoid broken links
* revert gpu-arch table, remove dropdowns, quick start hyphen removedon index.md
* revise opening text as per PR comment
---------
Co-authored-by: Lisa <lisa.delaney@amd.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
* add rocm software stack diagram to What is ROCm landing page
* restructure ROCm project list table
* clean up unnecessary hyphenation
* update What is ROCm stack diagram filename
* reorder rocm project list to reflect diagram
* update "What is ROCm?" image metadata
* change 'project list' to 'components'
* change 'project' to 'component'
* Update using-gpu-sanitizer.md
Minor OpenMP update
* Update using-gpu-sanitizer.md
Updated note with additional information.
* Update using-gpu-sanitizer.md
* Update using-gpu-sanitizer.md
Moved the note to another section
* Update using-gpu-sanitizer.md
For information on how to contribute to the ROCm documentation, see [Contributing to the ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
## Older ROCm releases
For release information for older ROCm releases, refer to the
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
The following image illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
| `OMP_NUM_TEAMS` | To set the number of teams for kernel launch, which is otherwise chosen by the implementation by default. You can set this number (subject to implementation limits) for performance tuning. |
| `LIBOMPTARGET_KERNEL_TRACE` | To print useful statistics for device operations. Setting it to 1 and running the program emits the name of every kernel launched, the number of teams and threads used, and the corresponding register usage. Setting it to 2 additionally emits timing information for kernel launches and data transfer operations between the host and the device. |
| `LIBOMPTARGET_INFO` | To print informational messages from the device runtime as the program executes. Setting it to a value of 1 or higher, prints fine-grain information and setting it to -1 prints complete information. |
| `LIBOMPTARGET_DEBUG` | To get detailed debugging information about data transfer operations and kernel launch when using a debug version of the device library. Set this environment variable to 1 to get the detailed information from the library. |
| `GPU_MAX_HW_QUEUES` | To set the number of HSA queues in the OpenMP runtime. The HSA queues are created on demand up to the maximum value as supplied here. The queue creation starts with a single initialized queue to avoid unnecessary allocation of resources. The provided value is capped if it exceeds the recommended, device-specific value. |
| `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` | To set the threshold size up to which data transfers are initiated asynchronously. The default threshold size is 1*1024*1024 bytes (1MB). |
| `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
:::
## OpenMP: features
The OpenMP programming model is greatly enhanced with the following new features
implemented in the past releases.
(openmp_usm)=
### Asynchronous behavior in OpenMP target regions
* Controlling Asynchronous Behavior
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
* Multithreaded Offloading on the Same Device
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
* Parallel Memory Copy Invocations
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
### Unified shared memory
Unified Shared Memory (USM) provides a pointer-based approach to memory
management. To implement USM, fulfill the following system requirements along
with Xnack capability.
#### Prerequisites
* Linux Kernel versions above 5.14
* Latest KFD driver packaged in ROCm stack
* Xnack, as USM support can only be tested with applications compiled with Xnack
capability
#### Xnack capability
When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
enabled both at compile- and run-time. To enable Xnack support at compile-time,
use:
```bash
--offload-arch=gfx908:xnack+
```
Or use another functionally equivalent option Xnack-any:
```bash
--offload-arch=gfx908
```
To enable Xnack functionality at runtime on a per-application basis,
use environment variable:
```bash
HSA_XNACK=1
```
When Xnack support is not needed:
* Build the applications to maximize resource utilization using:
```bash
--offload-arch=gfx908:xnack-
```
* At runtime, set the `HSA_XNACK` environment variable to 0.
#### Unified shared memory pragma
This OpenMP pragma is available on MI200 through `xnack+` support.
```bash
omp requires unified_shared_memory
```
As stated in the OpenMP specifications, this pragma makes the map clause on
target constructs optional. By default, on MI200, all memory allocated on the
host is fine grain. Using the map clause on a target clause is allowed, which
transforms the access semantics of the associated memory to coarse grain.
```bash
A simple program demonstrating the use of this feature is:
$ cat parallel_for.cpp
#include <stdlib.h>
#include <stdio.h>
#define N 64
#pragma omp requires unified_shared_memory
int main() {
int n = N;
int *a = new int[n];
int *b = new int[n];
for(int i = 0; i < n; i++)
b[i] = i;
#pragma omp target parallel for map(to:b[:n])
for(int i = 0; i < n; i++)
a[i] = b[i];
for(int i = 0; i < n; i++)
if(a[i] != i)
printf("error at %d: expected %d, got %d\n", i, i+1, a[i]);
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
* `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including no-loop and Cross-team reductions.
* `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
* `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
* `-O3` if no `-O*` is specified by the user.
### Specialized kernels
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
* No-loop
* Big-jump-loop
* Cross-team reductions
To enable the generation of specialized kernels, follow these guidelines:
* Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the no-loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the big-jump-loop kernel instead of the no-loop kernel.
* Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
* To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
#### No-loop kernel generation
The no-loop kernel generation feature optimizes the compiler performance by generating a specialized kernel for certain OpenMP target constructs such as target teams distribute parallel for. The specialized kernel generation feature assumes every thread executes a single iteration of the user loop, which leads the runtime to launch a total number of GPU threads equal to or greater than the iteration space size of the target region loop. This allows the compiler to generate code for the loop body without an enclosing loop, resulting in reduced control-flow complexity and potentially better performance.
#### Big-jump-loop kernel generation
A no-loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the big-jump-loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient cross-team communication. New APIs for cross-team reduction are implemented in the device runtime and are automatically generated by clang.
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
| [rocminfo](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm-systems/blob/develop/projects/rocminfo/License.txt) |
| [ROCm Data Center (RDC)](https://github.com/ROCm/rocm-systems/tree/develop/projects/rdc/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/rdc/LICENSE.md) |
| [ROCm-Device-Libs](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/LICENSE.TXT) |
| [ROCr Debug Agent](https://github.com/ROCm/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocr_debug_agent/blob/amd-staging/LICENSE.txt) |
| [ROCR-Runtime](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocr-runtime/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm-systems/blob/develop/projects/rocr-runtime/LICENSE.txt) |
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.
### Package licensing
:::{attention}
ROCprof Trace Decoder and AOCC CPU optimizations are provided in binary form, subject to the license agreement enclosed on [GitHub](https://github.com/ROCm/rocprof-trace-decoder/blob/amd-mainline/LICENSE) for ROCprof Trace Decoder, and [Developer Central](https://www.amd.com/en/developer/aocc.html) for AOCC. By using, installing,
copying or distributing ROCprof Trace Decoder or AOCC CPU Optimizations, you agree to
the terms and conditions of this license agreement. If you do not agree to the
terms of this agreement, do not install, copy or use ROCprof Trace Decoder or the
AOCC CPU Optimizations.
:::
For the rest of the ROCm packages, you can find the licensing information at the
following location: `/opt/rocm/share/doc/<component-name>/` or in the locations
specified in the preceding table.
For example, you can fetch the licensing information of the `amd_comgr`
component (Code Object Manager) from the `/opt/rocm/share/doc/amd_comgr/LICENSE.txt` file.
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
,"Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8",Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,,,
:doc:`ROCm Data Center Tool <rdc:index>`,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
,,,,,,,,,,,,,,,,,,,,,,,
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,,,,,,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.2.1,1.2.0,1.1.1,1.1.0,1.0.2,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
..[#os-compatibility] Some operating systems are supported on specific GPUs. For detailed information about operating systems supported on ROCm 7.2.0, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
..[#gpu-compatibility] Some GPUs have limited operating system support. For detailed information about GPUs supporting ROCm 7.2.0, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-gpus>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
..[#dgl_compat] DGL is only supported on ROCm 7.0.0, 6.4.3 and 6.4.0.
..[#llama-cpp_compat] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x.
..[#flashinfer_compat] FlashInfer is only supported on ROCm 7.1.1 and 6.4.1.
..[#mi325x_KVM] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
..[#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
..[#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
..[#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
.._OS-kernel-versions:
Operating systems, kernel and Glibc versions
*********************************************
For detailed information on operating system supported on ROCm 7.2.0 and associated Kernel and Glibc version, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
..note::
* See `Red Hat Enterprise Linux Release Dates <https://access.redhat.com/articles/3078>`_ to learn about the specific kernel versions supported on Red Hat Enterprise Linux (RHEL).
* See `List of SUSE Linux Enterprise Server kernel <https://www.suse.com/support/kb/doc/?id=000019587>`_ to learn about the specific kernel version supported on SUSE Linux Enterprise Server (SLES).
..
Footnotes and ref anchors in below historical tables should be appended with "-past-60", to differentiate from the
footnote references in the above, latest, compatibility matrix. It also allows to easily find & replace.
An easy way to work is to download the historical.CSV file, and update open it in excel. Then when content is ready,
delete the columns you don't need, to build the current compatibility matrix to use in above table. Find & replace all
instances of "-past-60" to make it ready for above table.
You can `download the entire .csv <../downloads/compatibility-matrix-historical-6.0.csv>`_ for offline reference.
..csv-table::
:file:compatibility-matrix-historical-6.0.csv
:header-rows:1
:stub-columns:1
..rubric:: Footnotes
..[#os-compatibility-past-60] Some operating systems are supported on specific GPUs. For detailed information, see :ref:`supported_distributions` and select the required ROCm version for version specific support.
..[#gpu-compatibility-past-60] Some GPUs have limited operating system support. For detailed information, see :ref:`supported_GPUs` and select the required ROCm version for version specific support.
..[#tf-mi350-past-60] TensorFlow 2.17.1 is not supported on AMD Instinct MI350 Series GPUs. Use TensorFlow 2.19.1 or 2.18.1 with MI350 Series GPUs instead.
..[#verl_compat-past-60] verl is only supported on ROCm 7.0.0 and 6.2.0.
..[#stanford-megatron-lm_compat-past-60] Stanford Megatron-LM is only supported on ROCm 6.3.0.
..[#dgl_compat-past-60] DGL is only supported on ROCm 7.0.0, 6.4.3 and 6.4.0.
..[#megablocks_compat-past-60] Megablocks is only supported on ROCm 6.3.0.
..[#ray_compat-past-60] Ray is only supported on ROCm 7.0.0 and 6.4.1.
..[#llama-cpp_compat-past-60] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x.
..[#flashinfer_compat-past-60] FlashInfer is only supported on ROCm 7.1.1 and 6.4.1.
..[#mi325x_KVM-past-60] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
..[#driver_patch-past-60] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
..[#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
..[#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
llama.cpp can be applied in a variety of scenarios, particularly when you need to meet one or more of the following requirements:
- Plain C/C++ implementation with no external dependencies
- Support for 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
- Custom HIP (Heterogeneous-compute Interface for Portability) kernels for running large language models (LLMs) on AMD GPUs (graphics processing units)
- CPU (central processing unit) + GPU (graphics processing unit) hybrid inference for partially accelerating models larger than the total available VRAM (video random-access memory)
llama.cpp is also used in a range of real-world applications, including:
- Games such as `Lucy's Labyrinth <https://github.com/MorganRO8/Lucys_Labyrinth>`__:
A simple maze game where AI-controlled agents attempt to trick the player.
- Tools such as `Styled Lines <https://marketplace.unity.com/packages/tools/ai-ml-integration/style-text-webgl-ios-stand-alone-llm-llama-cpp-wrapper-292902>`__:
A proprietary, asynchronous inference wrapper for Unity3D game development, including pre-built mobile and web platform wrappers and a model example.
- Various other AI applications use llama.cpp as their inference engine;
for a detailed list, see the `user interfaces (UIs) section <https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#description>`__.
For more use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__,
where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs.
- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration <https://rocm.blogs.amd.com/ecosystems-and-partners/llama-cpp/README.html>`__
blog post outlines how the open-source llama.cpp framework enables efficient LLM inference—including interactive inference with ``llama-cli``,
server deployment with ``llama-server``, GGUF model preparation and quantization, performance benchmarking, and optimizations tailored for
`Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/project/linux-pci/patch/1443110390-4080-1-git-send-email-jay@jcornwall.me/>`_
There are also a number of papers which talk about these new capabilities:
*`Atomic Read Modify Write Primitives by Intel <https://www.intel.es/content/dam/doc/white-paper/atomic-read-modify-write-primitives-i-o-devices-paper.pdf>`_
*`PCI express 3 Accelerator White paper by Intel <https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf>`_
*`Intel PCIe Generation 3 Hotchips Paper <https://www.hotchips.org/wp-content/uploads/hc_archives/hc21/1_sun/HC21.23.1.SystemInterconnectTutorial-Epub/HC21.23.131.Ajanovic-Intel-PCIeGen3.pdf>`_
*`PCIe Generation 4 Base Specification includes atomic operations <https://astralvx.com/storage/2020/11/PCI_Express_Base_4.0_Rev0.3_February19-2014.pdf>`_
The following sections cover inferencing and introduces [MIGraphX](https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/).
## Inference
The inference is where capabilities learned during deep-learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep-learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
Taking our previous example of MNIST, the DNN can be fed new images of handwritten digit images, allowing the neural network to classify digits. A fully trained DNN should make accurate predictions about what an image represents, and inference cannot happen without training.
## MIGraphX introduction
MIGraphX is a graph compiler focused on accelerating the machine-learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the machine-learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
* Operator fusion
* Arithmetic simplifications
* Dead-code elimination
* Common subexpression elimination (CSE)
* Constant propagation
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
* Number of arguments
* Type of arguments
* Shape of arguments
After optimization passes, all these operators get mapped to different kernels on GPUs or CPUs.
After importing a model into MIGraphX, the model is represented as `migraphx::program`. `migraphx::program` is made up of `migraphx::module`. The program can consist of several modules, but it always has one main_module. Modules are made up of `migraphx::instruction_ref`. Instructions contain the `migraphx::op` and arguments to the operator.
## Installing MIGraphX
There are three options to get started with MIGraphX installation. MIGraphX depends on ROCm libraries; assume that the machine has ROCm installed.
### Option 1: installing binaries
To install MIGraphX on Debian-based systems like Ubuntu, use the following command:
```bash
sudo apt update && sudo apt install -y migraphx
```
The header files and libraries are installed under `/opt/rocm-\<version\>`, where \<version\> is the ROCm version.
### Option 2: building from source
There are two ways to build the MIGraphX sources.
* [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses `[rbuild](https://github.com/RadeonOpenCompute/rbuild)` to install the prerequisites and build the libraries with just one command.
or
* [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
1. The easiest way to set up the development environment is to use Docker. To build Docker from scratch, first clone the MIGraphX repository by running:
2. The repository contains a Dockerfile from which you can build a Docker image as:
```bash
docker build -t migraphx .
```
3. Then to enter the development environment, use Docker run:
```bash
docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
```
The Docker image contains all the prerequisites required for the installation, so users can go to the folder `/code/AMDMIGraphX` and follow the steps mentioned in [Option 2: Building from Source](#option-2-building-from-source).
## MIGraphX example
MIGraphX provides both C++ and Python APIs. The following sections show examples of both using the Inception v3 model. To walk through the examples, fetch the Inception v3 ONNX model by running the following:
This will create `inceptioni1.onnx`, which can be imported in MIGraphX using C++ or Python API.
### MIGraphX Python API
Follow these steps:
1. To import the MIGraphX module in Python script, set `PYTHONPATH` to the MIGraphX libraries installation. If binaries are installed using steps mentioned in [Option 1: Installing Binaries](#option-1-installing-binaries), perform the following action:
```bash
export PYTHONPATH=$PYTHONPATH:/opt/rocm/
```
2. The following script shows the usage of Python API to import the ONNX model, compile it, and run inference on it. Set `LD_LIBRARY_PATH` to `/opt/rocm/` if required.
# feed image to model, 'x.1` is the input param name
results = model.run({'x.1': input_image})
# get the results back
result_np = np.array(results[0])
# print the inferred class of the input image
print(np.argmax(result_np))
```
Find additional examples of Python API in the `/examples` directory of the MIGraphX repository.
## MIGraphX C++ API
Follow these steps:
1. The following is a minimalist example that shows the usage of MIGraphX C++ API to load ONNX file, compile it for the GPU, and run inference on it. To use MIGraphX C++ API, you only need to load the `migraphx.hpp` file. This example runs inference on the Inception v3 model.
```c++
#include <vector>
#include <string>
#include <algorithm>
#include <ctime>
#include <random>
#include <migraphx/migraphx.hpp>
int main(int argc, char** argv)
{
migraphx::program prog;
migraphx::onnx_options onnx_opts;
// import and parse onnx file into migraphx::program
float* max = std::max_element(results, results + 1000);
int answer = max - results;
std::cout << "answer: " << answer << std::endl;
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
project (CAI)
set (CMAKE_CXX_STANDARD 14)
set (EXAMPLE inception_inference)
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
3. To build the executable file, run the following from the directory containing the `inception_inference.cpp` file:
```bash
mkdir build
cd build
cmake ..
make -j$(nproc)
./inception_inference
```
:::{note}
Set `LD_LIBRARY_PATH` to `/opt/rocm/lib` if required during the build. Additional examples can be found in the MIGraphX repository under the `/examples/` directory.
:::
## Tuning MIGraphX
MIGraphX uses MIOpen kernels to target AMD GPU. For the model compiled with MIGraphX, tune MIOpen to pick the best possible kernel implementation. The MIOpen tuning results in a significant performance boost. Tuning can be done by setting the environment variable `MIOPEN_FIND_ENFORCE=3`.
:::{note}
The tuning process can take a long time to finish.
:::
**Example:** The average inference time of the inception model example shown previously over 100 iterations using untuned kernels is 0.01383ms. After tuning, it reduces to 0.00459ms, which is a 3x improvement. This result is from ROCm v4.5 on a MI100 GPU.
:::{note}
The results may vary depending on the system configurations.
:::
For reference, the following code snippet shows inference runs for only the first 10 iterations for both tuned and untuned kernels:
```console
### UNTUNED ###
iterator : 0
Inference complete
Inference time: 0.063ms
iterator : 1
Inference complete
Inference time: 0.008ms
iterator : 2
Inference complete
Inference time: 0.007ms
iterator : 3
Inference complete
Inference time: 0.007ms
iterator : 4
Inference complete
Inference time: 0.007ms
iterator : 5
Inference complete
Inference time: 0.008ms
iterator : 6
Inference complete
Inference time: 0.007ms
iterator : 7
Inference complete
Inference time: 0.028ms
iterator : 8
Inference complete
Inference time: 0.029ms
iterator : 9
Inference complete
Inference time: 0.029ms
### TUNED ###
iterator : 0
Inference complete
Inference time: 0.063ms
iterator : 1
Inference complete
Inference time: 0.004ms
iterator : 2
Inference complete
Inference time: 0.004ms
iterator : 3
Inference complete
Inference time: 0.004ms
iterator : 4
Inference complete
Inference time: 0.004ms
iterator : 5
Inference complete
Inference time: 0.004ms
iterator : 6
Inference complete
Inference time: 0.004ms
iterator : 7
Inference complete
Inference time: 0.004ms
iterator : 8
Inference complete
Inference time: 0.004ms
iterator : 9
Inference complete
Inference time: 0.004ms
```
### YModel
The best inference performance through MIGraphX is conditioned upon having tuned kernel configurations stored in a `/home` local User Database (DB). If a user were to move their model to a different server or allow a different user to use it, they would have to run through the MIOpen tuning process again to populate the next User DB with the best kernel configurations and corresponding solvers.
Tuning is time consuming, and if the users have not performed tuning, they would see discrepancies between expected or claimed inference performance and actual inference performance. This has led to repetitive and time-consuming tuning tasks for each user.
MIGraphX introduces a feature, known as YModel, that stores the kernel config parameters found during tuning into a `.mxr` file. This ensures the same level of expected performance, even when a model is copied to a different user/system.
The YModel feature is available starting from ROCm 5.4.1 and UIF 1.1.
#### YModel example
Through the `migraphx-driver` functionality, you can generate `.mxr` files with tuning information stored inside it by passing additional `--binary --output model.mxr` to `migraphx-driver` along with the rest of the necessary flags.
For example, to generate `.mxr` file from the ONNX model, use the following:
@@ -22,6 +22,7 @@ Training occurs in multiple phases for every batch of training data. the followi
:::{table} Types of Training Phases
:name: training-phases
:widths: auto
| Types of Phases | |
| ----------------- | --- |
| Forward Pass | The input features are fed into the model, whose parameters may be randomly initialized initially. Activations (outputs) of each layer are retained during this pass to help in the loss gradient computation during the backward pass. |
@@ -35,6 +36,7 @@ Training is different from inference, particularly from the hardware perspective
:::{table} Training vs. Inference
:name: training-inference
:widths: auto
| Training | Inference |
| ----------- | ----------- |
| Training is measured in hours/days. | The inference is measured in minutes. |
@@ -51,7 +53,7 @@ The following sections contain case studies for the Inception V3 model.
### Inception V3 with PyTorch
Convolution Neural Networks are forms of artificial neural networks commonly used for image processing. One of the core layers of such a network is the convolutional layer, which convolves the input with a weight tensor and passes the result to the next layer. Inception V3[^inception_arch] is an architectural development over the ImageNet competition-winning entry, AlexNet, using more profound and broader networks while attempting to meet computational and memory budgets.
Convolution Neural Networks are forms of artificial neural networks commonly used for image processing. One of the core layers of such a network is the convolutional layer, which convolves the input with a weight tensor and passes the result to the next layer. Inception V3 is an architectural development over the ImageNet competition-winning entry, AlexNet, using more profound and broader networks while attempting to meet computational and memory budgets.
The implementation uses PyTorch as a framework. This case study utilizes [TorchVision](https://pytorch.org/vision/stable/index.html), a repository of popular datasets and model architectures, for obtaining the model. TorchVision also provides pre-trained weights as a starting point to develop new models or fine-tune the model for a new task.
@@ -63,7 +65,7 @@ This example is adapted from the PyTorch research hub page on [Inception V3](htt
Follow these steps:
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm-based Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
@@ -153,14 +155,14 @@ The previous section focused on downloading and using the Inception V3 model for
Follow these steps:
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:how-to/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
1. Run the PyTorch ROCm Docker image or refer to the section {doc}`Installing PyTorch <rocm-install-on-linux:install/3rd-party/pytorch-install>` for setting up a PyTorch environment on ROCm.
```dockerfile
docker pull rocm/pytorch:latest
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
```
2. Download an ImageNet database. For this example, the `tiny-imagenet-200`[^Stanford_deep_learning], a smaller ImageNet variant with 200 image classes and a training dataset with 100,000 images, was downsized to 64x64 color images.
2. Download an ImageNet database. For this example, the `tiny-imagenet-200`, a smaller ImageNet variant with 200 image classes and a training dataset with 100,000 images, was downsized to 64x64 color images.
To understand the code step by step, follow these steps:
@@ -876,7 +878,7 @@ To understand the code step by step, follow these steps:
thisplot[true_label].set_color('blue')
```
9. With the model trained, you can use it to make predictions about some images. Review the 0-th image predictions and the prediction array. Correct prediction labels are blue, and incorrect prediction labels are red. The number gives the percentage (out of 100) for the predicted label.
9. With the model trained, you can use it to make predictions about some images. Review the 0<sup>th</sup> image predictions and the prediction array. Correct prediction labels are blue, and incorrect prediction labels are red. The number gives the percentage (out of 100) for the predicted label.
ROCm ships multiple compilers of varying origins and purposes. This article
disambiguates compiler naming used throughout the documentation.
## Compiler terms
| Term | Description |
| - | - |
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |
The following image shows the node-level architecture of a system that
comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators.
comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ GPUs.
The two EPYC processors are connected to each other with the AMD Infinity™
fabric which provides a high-bandwidth (up to 18 GT/sec) and coherent links such
that each processor can access the available node memory as a single
@@ -17,29 +18,29 @@ available to connect the processors plus one PCIe Gen 4 x16 link per processor
can attach additional I/O devices such as the host adapters for the network
fabric.


In a typical node configuration, each processor can host up to four AMD
Instinct™ accelerators that are attached using PCIe Gen 4 links at 16 GT/sec,
Instinct™ GPUs that are attached using PCIe Gen 4 links at 16 GT/sec,
which corresponds to a peak bidirectional link bandwidth of 32 GB/sec. Each hive
of four accelerators can participate in a fully connected, coherent AMD
Instinct™ fabric that connects the four accelerators using 23 GT/sec AMD
of four GPUs can participate in a fully connected, coherent AMD
Instinct™ fabric that connects the four GPUs using 23 GT/sec AMD
Infinity fabric links that run at a higher frequency than the inter-processor
links. This inter-GPU link can be established in certified server systems if the
GPUs are mounted in neighboring PCIe slots by installing the AMD Infinity
Fabric™ bridge for the AMD Instinct™ accelerators.
Fabric™ bridge for the AMD Instinct™ GPUs.
## Microarchitecture
The microarchitecture of the AMD Instinct accelerators is based on the AMD CDNA
The microarchitecture of the AMD Instinct GPUs is based on the AMD CDNA
architecture, which targets compute applications such as high-performance
computing (HPC) and AI & machine learning (ML) that run on everything from
individual servers to the world's largest exascale supercomputers. The overall
system architecture is designed for extreme scalability and compute performance.
")
")
The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16
The above image shows the AMD Instinct GPU with its PCIe Gen 4 x16
link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host
processor(s). It also shows the three AMD Infinity Fabric ports that provide
high-speed links (23 GT/sec, also at the bottom) to the other GPUs of the local
@@ -47,7 +48,7 @@ hive.
On the left and right of the floor plan, the High Bandwidth Memory (HBM)
attaches via the GPU memory controller. The MI100 generation of the AMD
Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total
Instinct GPU offers four stacks of HBM generation 2 (HBM2) for a total
of 32GB with a 4,096bit-wide memory interface. The peak memory bandwidth of the
attached HBM2 is 1.228 TB/sec at a memory clock frequency of 1.2 GHz.
@@ -63,7 +64,7 @@ Therefore, the theoretical maximum FP64 peak performance is 11.5 TFLOPS

The preceding image shows the block diagram of a single CU of an AMD Instinct™
MI100 accelerator and summarizes how instructions flow through the execution
MI100 GPU and summarizes how instructions flow through the execution
engines. The CU fetches the instructions via a 32KB instruction cache and moves
them forward to execution via a dispatcher. The CU can handle up to ten
wavefronts at a time and feed their instructions into the execution unit. The
This document lists and describes the hardware performance counters and derived metrics available on the AMD Instinct™ MI200 GPU. All the hardware basic counters and derived metrics are accessible via {doc}`ROCProfiler tool <rocprofiler:rocprofv1>`.
## MI200 performance counters list
See the category-wise listing of MI200 performance counters in the following tables.
:::{note}
Preliminary validation of all MI200 performance counters is in progress. Those with “*” appended to the names require further evaluation.
:::
### Graphics Register Bus Management (GRBM) counters
| `SPI_CSN_BUSY` | Cycles | Number of cycles with outstanding waves |
| `SPI_CSN_WINDOW_VALID` | Cycles | Number of cycles enabled by `perfcounter_start` event |
| `SPI_CSN_NUM_THREADGROUPS` | Workgroups | Number of dispatched workgroups |
| `SPI_CSN_WAVE` | Wavefronts | Number of dispatched wavefronts |
| `SPI_RA_REQ_NO_ALLOC` | Cycles | Number of Arb cycles with requests but no allocation |
|`SPI_RA_REQ_NO_ALLOC_CSN` | Cycles | Number of Arb cycles with Compute Shader, n-th pipe (CSn) requests but no CSn allocation |
| `SPI_RA_RES_STALL_CSN` | Cycles | Number of Arb stall cycles due to shortage of CSn pipeline slots |
| `SPI_RA_TMP_STALL_CSN*` | Cycles | Number of stall cycles due to shortage of temp space |
| `SPI_RA_WAVE_SIMD_FULL_CSN` | SIMD-cycles | Accumulated number of Single Instruction Multiple Data (SIMDs) per cycle affected by shortage of wave slots for CSn wave dispatch |
| `SPI_RA_VGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of VGPR slots for CSn wave dispatch |
| `SPI_RA_SGPR_SIMD_FULL_CSN*` | SIMD-cycles | Accumulated number of SIMDs per cycle affected by shortage of SGPR slots for CSn wave dispatch |
| `SPI_RA_LDS_CU_FULL_CSN` | CUs | Number of Compute Units (CUs) affected by shortage of LDS space for CSn wave dispatch |
| `SPI_RA_BAR_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting at a BARRIER |
| `SPI_RA_BULKY_CU_FULL_CSN*` | CUs | Number of CUs with CSn waves waiting for BULKY resource |
| `SPI_RA_TGLIM_CU_FULL_CSN*` | Cycles | Number of CSn wave stall cycles due to restriction of `tg_limit` for thread group size |
| `SPI_RA_WVLIM_STALL_CSN*` | Cycles | Number of cycles CSn is stalled due to WAVE_LIMIT |
| `SPI_VWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (VGPRs) when launching waves |
| `SPI_SWC_CSC_WR` | Qcycles | Number of quad-cycles taken to initialize Vector General Purpose Register (SGPRs) when launching waves |
### Compute Unit (CU) counters
The CU counters are further classified into instruction mix, Matrix Fused Multiply Add (MFMA) operation counters, level counters, wavefront counters, wavefront cycle counters and Local Data Share (LDS) counters.
| `SQ_ACCUM_PREV` | Count | Accumulated counter sample value where accumulation takes place once every four cycles. |
| `SQ_ACCUM_PREV_HIRES` | Count | Accumulated counter sample value where accumulation takes place once every cycle. |
| `SQ_LEVEL_WAVES` | Waves | Number of inflight waves. To calculate the wave latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_WAVE`. |
| `SQ_INST_LEVEL_VMEM` | Instr | Number of inflight VMEM (including FLAT) instructions. To calculate the VMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_VMEM`. |
| `SQ_INST_LEVEL_SMEM` | Instr | Number of inflight SMEM instructions. To calculate the SMEM latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_SMEM_NORM`. |
| `SQ_INST_LEVEL_LDS` | Instr | Number of inflight LDS (including FLAT) instructions. To calculate the LDS latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_INSTS_LDS`. |
| `SQ_IFETCH_LEVEL` | Instr | Number of inflight instruction fetch requests from the cache. To calculate the instruction fetch latency, divide `SQ_ACCUM_PREV_HIRES` by `SQ_IFETCH`. |
| `SQ_BUSY_CYCLES` | Cycles | Number of cycles while SQ reports it to be busy. |
| `SQ_BUSY_CU_CYCLES` | Qcycles | Number of quad-cycles each CU is busy. |
| `SQ_VALU_MFMA_BUSY_CYCLES` | Cycles | Number of cycles the MFMA ALU is busy. |
| `SQ_WAVE_CYCLES` | Qcycles | Number of quad-cycles spent by waves in the CUs. |
| `SQ_WAIT_ANY` | Qcycles | Number of quad-cycles spent waiting for anything. |
| `SQ_WAIT_INST_ANY` | Qcycles | Number of quad-cycles spent waiting for any instruction to be issued. |
| `SQ_ACTIVE_INST_ANY` | Qcycles | Number of quad-cycles spent by each wave to work on an instruction. |
| `SQ_ACTIVE_INST_VMEM` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VMEM instruction. |
| `SQ_ACTIVE_INST_LDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an LDS instruction. |
| `SQ_ACTIVE_INST_VALU` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a VALU instruction. |
| `SQ_ACTIVE_INST_SCA` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a SALU or SMEM instruction. |
| `SQ_ACTIVE_INST_EXP_GDS` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on an EXPORT or GDS instruction. |
| `SQ_ACTIVE_INST_MISC` | Qcycles | Number of quad-cycles spent by the SQ instruction aribter to work on a BRANCH or `SENDMSG` instruction. |
| `SQ_ACTIVE_INST_FLAT` | Qcycles | Number of quad-cycles spent by the SQ instruction arbiter to work on a FLAT instruction. |
| `SQ_INST_CYCLES_VMEM_WR` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Write instructions. |
| `SQ_INST_CYCLES_VMEM_RD` | Qcycles | Number of quad-cycles spent to send addr and cmd data for VMEM Read instructions. |
| `SQ_INST_CYCLES_SMEM` | Qcycles | Number of quad-cycles spent to execute scalar memory reads. |
| `SQ_INST_CYCLES_SALU` | Qcycles | Number of quad-cycles spent to execute non-memory read scalar operations. |
| `SQ_THREAD_CYCLES_VALU` | Cycles | Number of thread-cycles spent to execute VALU operations. This is similar to `INST_CYCLES_VALU` but multiplied by the number of active threads. |
| `SQ_WAIT_INST_LDS` | Qcycles | Number of quad-cycles spent waiting for LDS instruction to be issued. |
| `SQC_ICACHE_REQ` | Req | Number of `L1I` cache requests |
| `SQC_ICACHE_HITS` | Count | Number of `L1I` cache hits |
| `SQC_ICACHE_MISSES` | Count | Number of non-duplicate `L1I` cache misses including uncached requests |
| `SQC_ICACHE_MISSES_DUPLICATE` | Count | Number of duplicate `L1I` cache misses whose previous lookup miss on the same cache line is not fulfilled yet |
| `SQC_DCACHE_REQ` | Req | Number of `sL1D` cache requests |
| `SQC_DCACHE_INPUT_VALID_READYB` | Cycles | Number of cycles while SQ input is valid but sL1D cache is not ready |
| `SQC_DCACHE_HITS` | Count | Number of `sL1D` cache hits |
| `SQC_DCACHE_MISSES` | Count | Number of non-duplicate `sL1D` cache misses including uncached requests |
| `SQC_DCACHE_MISSES_DUPLICATE` | Count | Number of duplicate `sL1D` cache misses |
| `SQC_DCACHE_REQ_READ_1` | Req | Number of constant cache read requests in a single DW |
| `SQC_DCACHE_REQ_READ_2` | Req | Number of constant cache read requests in two DW |
| `SQC_DCACHE_REQ_READ_4` | Req | Number of constant cache read requests in four DW |
| `SQC_DCACHE_REQ_READ_8` | Req | Number of constant cache read requests in eight DW |
| `SQC_DCACHE_REQ_READ_16` | Req | Number of constant cache read requests in 16 DW |
| `SQC_DCACHE_ATOMIC*` | Req | Number of atomic requests |
| `SQC_TC_REQ` | Req | Number of TC requests that were issued by instruction and constant caches |
| `SQC_TC_INST_REQ` | Req | Number of instruction requests to the L2 cache |
| `SQC_TC_DATA_READ_REQ` | Req | Number of data Read requests to the L2 cache |
| `SQC_TC_DATA_WRITE_REQ*` | Req | Number of data write requests to the L2 cache |
| `SQC_TC_DATA_ATOMIC_REQ*` | Req | Number of data atomic requests to the L2 cache |
| `SQC_TC_STALL*` | Cycles | Number of cycles while the valid requests to the L2 cache are stalled |
### Vector L1 cache subsystem
The vector L1 cache subsystem counters are further classified into Texture Addressing Unit (TA), Texture Data Unit (TD), vector L1D cache or Texture Cache per Pipe (TCP), and Texture Cache Arbiter (TCA) counters.
| `TCP_GATE_EN1[n]` | Cycles | Number of cycles vL1D interface clocks are turned on. Value range for n: [0-15]. |
| `TCP_GATE_EN2[n]` | Cycles | Number of cycles vL1D core clocks are turned on. Value range for n: [0-15]. |
| `TCP_TD_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TD stalls vL1D. Value range for n: [0-15]. |
| `TCP_TCR_TCP_STALL_CYCLES[n]` | Cycles | Number of cycles TCR stalls vL1D. Value range for n: [0-15]. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a read. Value range for n: [0-15]. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on a write. Value range for n: [0-15]. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES[n]` | Cycles | Number of cycles tagram conflict stalls on an atomic. Value range for n: [0-15]. |
| `TCP_PENDING_STALL_CYCLES[n]` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache. Value range for n: [0-15]. |
| `TCP_TCP_TA_DATA_STALL_CYCLES` | Cycles | Number of cycles TCP stalls TA data interface. |
| `TCP_TA_TCP_STATE_READ[n]` | Req | Number of state reads. Value range for n: [0-15]. |
| `TCP_VOLATILE[n]` | Req | Number of L1 volatile pixels/buffers from TA. Value range for n: [0-15]. |
| `TCP_TOTAL_ACCESSES[n]` | Req | Number of vL1D accesses. Equals `TCP_PERF_SEL_TOTAL_READ`+`TCP_PERF_SEL_TOTAL_NONREAD`. Value range for n: [0-15]. |
| `TCP_TOTAL_READ[n]` | Req | Number of vL1D read accesses. Equals `TCP_PERF_SEL_TOTAL_HIT_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_LRU_READ` + `TCP_PERF_SEL_TOTAL_MISS_EVICT_READ`. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITE[n]` | Req | Number of vL1D write accesses. `Equals TCP_PERF_SEL_TOTAL_MISS_LRU_WRITE`+ `TCP_PERF_SEL_TOTAL_MISS_EVICT_WRITE`. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITH_RET[n]` | Req | Number of vL1D atomic requests with return. Value range for n: [0-15]. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET[n]` | Req | Number of vL1D atomic without return. Value range for n: [0-15]. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES[n]` | Count | Total number of vL1D writebacks and invalidates. Equals `TCP_PERF_SEL_TOTAL_WBINVL1`+ `TCP_PERF_SEL_TOTAL_WBINVL1_VOL`+ `TCP_PERF_SEL_CP_TCP_INVALIDATE`+ `TCP_PERF_SEL_SQ_TCP_INVALIDATE_VOL`. Value range for n: [0-15]. |
| `TCP_UTCL1_REQUEST[n]` | Req | Number of address translation requests to UTCL1. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_HIT[n]` | Req | Number of UTCL1 translation hits. Value range for n: [0-15]. |
| `TCP_UTCL1_TRANSLATION_MISS[n]` | Req | Number of UTCL1 translation misses. Value range for n: [0-15]. |
| `TCP_UTCL1_PERMISSION_MISS[n]` | Req | Number of UTCL1 permission misses. Value range for n: [0-15]. |
| `TCP_TOTAL_CACHE_ACCESSES[n]` | Req | Number of vL1D cache accesses including hits and misses. Value range for n: [0-15]. |
| `TCP_TCP_LATENCY[n]` | Cycles | Accumulated wave access latency to vL1D over all wavefronts. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ_LATENCY[n]` | Cycles | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return. Value range for n: [0-15]. |
| `TCP_TCC_READ_REQ[n]` | Req | Number of read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_WRITE_REQ[n]` | Req | Number of write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ[n]` | Req | Number of atomic requests to L2 cache with return. Value range for n: [0-15]. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]` | Req | Number of atomic requests to L2 cache without return. Value range for n: [0-15]. |
| `TCP_TCC_NC_READ_REQ[n]` | Req | Number of NC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_READ_REQ[n]` | Req | Number of UC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_READ_REQ[n]` | Req | Number of CC read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_READ_REQ[n]` | Req | Number of RW read requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_WRITE_REQ[n]` | Req | Number of NC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_WRITE_REQ[n]` | Req | Number of UC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_WRITE_REQ[n]` | Req | Number of CC write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_WRITE_REQ[n]` | Req | Number of RW write requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_NC_ATOMIC_REQ[n]` | Req | Number of NC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_UC_ATOMIC_REQ[n]` | Req | Number of UC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_CC_ATOMIC_REQ[n]` | Req | Number of CC atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCP_TCC_RW_ATOMIC_REQ[n]` | Req | Number of RW atomic requests to L2 cache. Value range for n: [0-15]. |
| `TCC_CYCLE[n]` |Cycle | Number of L2 cache free-running clocks. Value range for n: [0-31]. |
| `TCC_BUSY[n]` |Cycle | Number of L2 cache busy cycles. Value range for n: [0-31]. |
| `TCC_REQ[n]` |Req | Number of L2 cache requests of all types. This is measured at the tag block. This may be more than the number of requests arriving at the TCC, but it is a good indication of the total amount of work that needs to be performed. Value range for n: [0-31]. |
| `TCC_STREAMING_REQ[n]` |Req | Number of L2 cache streaming requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_NC_REQ[n]` |Req | Number of NC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_UC_REQ[n]` |Req | Number of UC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_CC_REQ[n]` |Req | Number of CC requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_RW_REQ[n]` |Req | Number of RW requests. This is measured at the tag block. Value range for n: [0-31]. |
| `TCC_PROBE[n]` |Req | Number of probe requests. Value range for n: [0-31]. |
| `TCC_PROBE_ALL[n]` |Req | Number of external probe requests with `EA_TCC_preq_all`== 1. Value range for n: [0-31]. |
| `TCC_READ[n]` |Req | Number of L2 cache read requests. This includes compressed reads but not metadata reads. Value range for n: [0-31]. |
| `TCC_WRITE[n]` |Req | Number of L2 cache write requests. Value range for n: [0-31]. |
| `TCC_ATOMIC[n]` |Req | Number of L2 cache atomic requests of all types. Value range for n: [0-31]. |
| `TCC_HIT[n]` |Req | Number of L2 cache hits. Value range for n: [0-31]. |
| `TCC_MISS[n]` |Req | Number of L2 cache misses. Value range for n: [0-31]. |
| `TCC_WRITEBACK[n]` |Req | Number of lines written back to the main memory, including writebacks of dirty lines and uncached write/atomic requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ[n]` |Req | Number of 32-byte and 64-byte transactions going over the `TC_EA_wrreq` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_64B[n]` |Req | Total number of 64-byte transactions (write or `CMPSWAP`) going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_WR_UNCACHED_32B[n]` |Req | Number of 32-byte write/atomic going over the `TC_EA_wrreq` interface due to uncached traffic. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. Value range for n: [0-31].|
| `TCC_EA_WRREQ_STALL[n]` | Cycles | Number of cycles a write request is stalled. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of IO credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of GMI credits. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles an EA write request is stalled due to the interface running out of DRAM credits. Value range for n: [0-31]. |
| `TCC_TOO_MANY_EA_WRREQS_STALL[n]` | Cycles | Number of cycles the L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests. Value range for n: [0-31]. |
| `TCC_EA_WRREQ_LEVEL[n]` | Req | The accumulated number of EA write requests in flight. This is primarily intended to measure average EA write latency. Average write latency = `TCC_PERF_SEL_EA_WRREQ_LEVEL`/`TCC_PERF_SEL_EA_WRREQ`. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC[n]` | Req | Number of 32-byte or 64-byte atomic requests going over the `TC_EA_wrreq` interface. Value range for n: [0-31]. |
| `TCC_EA_ATOMIC_LEVEL[n]` | Req | The accumulated number of EA atomic requests in flight. This is primarily intended to measure average EA atomic latency. Average atomic latency = `TCC_PERF_SEL_EA_WRREQ_ATOMIC_LEVEL`/`TCC_PERF_SEL_EA_WRREQ_ATOMIC`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ[n]` | Req | Number of 32-byte or 64-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_32B[n]` | Req | Number of 32-byte read requests to EA. Value range for n: [0-31]. |
| `TCC_EA_RD_UNCACHED_32B[n]` | Req | Number of 32-byte EA reads due to uncached traffic. A 64-byte request is counted as 2. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of IO credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of GMI credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]` | Cycles | Number of cycles there is a stall due to the read request interface running out of DRAM credits. Stalls occur irrespective of the need for a read to be performed. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_LEVEL[n]` | Req | The accumulated number of EA read requests in flight. This is primarily intended to measure average EA read latency. Average read latency = `TCC_PERF_SEL_EA_RDREQ_LEVEL`/`TCC_PERF_SEL_EA_RDREQ`. Value range for n: [0-31]. |
| `TCC_EA_RDREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA read requests to High Bandwidth Memory (HBM). Value range for n: [0-31]. |
| `TCC_EA_WRREQ_DRAM[n]` | Req | Number of 32-byte or 64-byte EA write requests to HBM. Value range for n: [0-31]. |
| `TCC_TAG_STALL[n]` | Cycles | Number of cycles the normal request pipeline in the tag is stalled for any reason. Normally, stalls of this nature are measured exactly at one point in the pipeline however in case of this counter, probes can stall the pipeline at a variety of places and there is no single point that can reasonably measure the total stalls accurately. Value range for n: [0-31]. |
| `TCC_NORMAL_WRITEBACK[n]` | Req | Number of writebacks due to requests that are not writeback requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_WB_WRITEBACK[n]` | Req | Number of writebacks due to all `TC_OP` writeback requests. Value range for n: [0-31]. |
| `TCC_NORMAL_EVICT[n]` | Req | Number of evictions due to requests that are not invalidate or probe requests. Value range for n: [0-31]. |
| `TCC_ALL_TC_OP_INV_EVICT[n]` | Req | Number of evictions due to all `TC_OP` invalidate requests. Value range for n: [0-31]. |
| `ALUStalledByLDS` | Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready. Reduce this by reducing the LDS bank conflicts or the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). |
| `FetchSize` | Total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `FlatLDSInsts` | Average number of FLAT instructions that read from or write to LDS, executed per work item (affected by flow control). |
| `FlatVMemInsts` | Average number of FLAT instructions that read from or write to the video memory, executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch. |
| `GDSInsts` | Average number of GDS read/write instructions executed per work item (affected by flow control). |
| `GPUBusy` | Percentage of time GPU is busy. |
| `L2CacheHit` | Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal). |
| `LDSBankConflict` | Percentage of GPU time LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad). |
| `LDSInsts` | Average number of LDS read/write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS. |
| `MemUnitBusy` | Percentage of GPU time the memory unit is active. The result includes the stall time (`MemUnitStalled`). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). |
| `MemUnitStalled` | Percentage of GPU time the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad). |
| `MemWrites32B` | Total number of effective 32B write transactions to the memory. |
| `SALUBusy` | Percentage of GPU time scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `SALUInsts` | Average number of scalar ALU instructions executed per work item (affected by flow control). |
| `SFetchInsts` | Average number of scalar fetch instructions from the video memory executed per work item (affected by flow control). |
| `TA_ADDR_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA address path is stalled by TC, over all TA instances. |
| `TA_ADDR_STALLED_BY_TD_CYCLES_sum` | Total number of cycles TA address path is stalled by TD, over all TA instances. |
| `TA_BUFFER_WAVEFRONTS_sum` | Total number of buffer wavefronts processed by all TA instances. |
| `TA_BUFFER_READ_WAVEFRONTS_sum` | Total number of buffer read wavefronts processed by all TA instances. |
| `TA_BUFFER_WRITE_WAVEFRONTS_sum` | Total number of buffer write wavefronts processed by all TA instances. |
| `TA_BUFFER_ATOMIC_WAVEFRONTS_sum` | Total number of buffer atomic wavefronts processed by all TA instances. |
| `TA_BUFFER_TOTAL_CYCLES_sum` | Total number of buffer cycles (including read and write) issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_READ_CYCLES_sum` | Total number of coalesced buffer read cycles issued to TC by all TA instances. |
| `TA_BUFFER_COALESCED_WRITE_CYCLES_sum` | Total number of coalesced buffer write cycles issued to TC by all TA instances. |
| `TA_BUSY_avr` | Average number of busy cycles over all TA instances. |
| `TA_BUSY_max` | Maximum number of TA busy cycles over all TA instances. |
| `TA_BUSY_min` | Minimum number of TA busy cycles over all TA instances. |
| `TA_DATA_STALLED_BY_TC_CYCLES_sum` | Total number of cycles TA data path is stalled by TC, over all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Sum of flat opcode reads processed by all TA instances. |
| `TA_FLAT_WRITE_WAVEFRONTS_sum` | Sum of flat opcode writes processed by all TA instances. |
| `TA_FLAT_WAVEFRONTS_sum` | Total number of flat opcode wavefronts processed by all TA instances. |
| `TA_FLAT_READ_WAVEFRONTS_sum` | Total number of flat opcode read wavefronts processed by all TA instances. |
| `TA_FLAT_ATOMIC_WAVEFRONTS_sum` | Total number of flat opcode atomic wavefronts processed by all TA instances. |
| `TA_TA_BUSY_sum` | Total number of TA busy cycles over all TA instances. |
| `TA_TOTAL_WAVEFRONTS_sum` | Total number of wavefronts processed by all TA instances. |
| `TCA_BUSY_sum` | Total number of cycles TCA has a pending request, over all TCA instances. |
| `TCA_CYCLE_sum` | Total number of cycles over all TCA instances. |
| `TCC_ALL_TC_OP_WB_WRITEBACK_sum` | Total number of writebacks due to all TC_OP writeback requests, over all TCC instances. |
| `TCC_ALL_TC_OP_INV_EVICT_sum` | Total number of evictions due to all TC_OP invalidate requests, over all TCC instances. |
| `TCC_ATOMIC_sum` | Total number of L2 cache atomic requests of all types, over all TCC instances. |
| `TCC_BUSY_avr` | Average number of L2 cache busy cycles, over all TCC instances. |
| `TCC_BUSY_sum` | Total number of L2 cache busy cycles, over all TCC instances. |
| `TCC_CC_REQ_sum` | Total number of CC requests over all TCC instances. |
| `TCC_CYCLE_sum` | Total number of L2 cache free running clocks, over all TCC instances. |
| `TCC_EA_WRREQ_sum` | Total number of 32-byte and 64-byte transactions going over the TC_EA_wrreq interface, over all TCC instances. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands. |
| `TCC_EA_WRREQ_64B_sum` | Total number of 64-byte transactions (write or `CMPSWAP`) going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_WR_UNCACHED_32B_sum` | Total Number of 32-byte write/atomic going over the TC_EA_wrreq interface due to uncached traffic, over all TCC instances. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. |
| `TCC_EA_WRREQ_STALL_sum` | Total Number of cycles a write request is stalled, over all instances. |
| `TCC_EA_WRREQ_IO_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of IO credits, over all instances. |
| `TCC_EA_WRREQ_GMI_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of GMI credits, over all instances. |
| `TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles an EA write request is stalled due to the interface running out of DRAM credits, over all instances. |
| `TCC_EA_WRREQ_LEVEL_sum` | Total number of EA write requests in flight over all TCC instances. |
| `TCC_EA_RDREQ_LEVEL_sum` | Total number of EA read requests in flight over all TCC instances. |
| `TCC_EA_ATOMIC_sum` | Total Number of 32-byte or 64-byte atomic requests going over the TC_EA_wrreq interface, over all TCC instances. |
| `TCC_EA_ATOMIC_LEVEL_sum` | Total number of EA atomic requests in flight, over all TCC instances. |
| `TCC_EA_RDREQ_sum` | Total number of 32-byte or 64-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RDREQ_32B_sum` | Total number of 32-byte read requests to EA, over all TCC instances. |
| `TCC_EA_RD_UNCACHED_32B_sum` | Total number of 32-byte EA reads due to uncached traffic, over all TCC instances. |
| `TCC_EA_RDREQ_IO_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of IO credits, over all TCC instances. |
| `TCC_EA_RDREQ_GMI_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of GMI credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum` | Total number of cycles there is a stall due to the read request interface running out of DRAM credits, over all TCC instances. |
| `TCC_EA_RDREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA read requests to HBM, over all TCC instances. |
| `TCC_EA_WRREQ_DRAM_sum` | Total number of 32-byte or 64-byte EA write requests to HBM, over all TCC instances. |
| `TCC_HIT_sum` | Total number of L2 cache hits over all TCC instances. |
| `TCC_MISS_sum` | Total number of L2 cache misses over all TCC instances. |
| `TCC_NC_REQ_sum` | Total number of NC requests over all TCC instances. |
| `TCC_NORMAL_WRITEBACK_sum` | Total number of writebacks due to requests that are not writeback requests, over all TCC instances. |
| `TCC_NORMAL_EVICT_sum` | Total number of evictions due to requests that are not invalidate or probe requests, over all TCC instances. |
| `TCC_PROBE_sum` | Total number of probe requests over all TCC instances. |
| `TCC_PROBE_ALL_sum` | Total number of external probe requests with EA_TCC_preq_all== 1, over all TCC instances. |
| `TCC_READ_sum` | Total number of L2 cache read requests (including compressed reads but not metadata reads) over all TCC instances. |
| `TCC_REQ_sum` | Total number of all types of L2 cache requests over all TCC instances. |
| `TCC_RW_REQ_sum` | Total number of RW requests over all TCC instances. |
| `TCC_STREAMING_REQ_sum` | Total number of L2 cache streaming requests over all TCC instances. |
| `TCC_TAG_STALL_sum` | Total number of cycles the normal request pipeline in the tag is stalled for any reason, over all TCC instances. |
| `TCC_TOO_MANY_EA_WRREQS_STALL_sum` | Total number of cycles L2 cache is unable to send an EA write request due to it reaching its maximum capacity of pending EA write requests, over all TCC instances. |
| `TCC_UC_REQ_sum` | Total number of UC requests over all TCC instances. |
| `TCC_WRITE_sum` | Total number of L2 cache write requests over all TCC instances. |
| `TCC_WRITEBACK_sum` | Total number of lines written back to the main memory including writebacks of dirty lines and uncached write/atomic requests, over all TCC instances. |
| `TCC_WRREQ_STALL_max` | Maximum number of cycles a write request is stalled, over all TCC instances. |
| `TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on an atomic, over all TCP instances. |
| `TCP_GATE_EN1_sum` | Total number of cycles vL1D interface clocks are turned on, over all TCP instances. |
| `TCP_GATE_EN2_sum` | Total number of cycles vL1D core clocks are turned on, over all TCP instances. |
| `TCP_PENDING_STALL_CYCLES_sum` | Total number of cycles vL1D cache is stalled due to data pending from L2 Cache, over all TCP instances. |
| `TCP_READ_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a read, over all TCP instances. |
| `TCP_TA_TCP_STATE_READ_sum` | Total number of state reads by all TCP instances. |
| `TCP_TCC_ATOMIC_WITH_RET_REQ_sum` | Total number of atomic requests to L2 cache with return, over all TCP instances. |
| `TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum` | Total number of atomic requests to L2 cache without return, over all TCP instances. |
| `TCP_TCC_CC_READ_REQ_sum` | Total number of CC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_WRITE_REQ_sum` | Total number of CC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_CC_ATOMIC_REQ_sum` | Total number of CC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_READ_REQ_sum` | Total number of NC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_WRITE_REQ_sum` | Total number of NC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_NC_ATOMIC_REQ_sum` | Total number of NC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_READ_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for reads and atomics with return for all TCP instances. |
| `TCP_TCC_READ_REQ_sum` | Total number of read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_READ_REQ_sum` | Total number of RW read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_WRITE_REQ_sum` | Total number of RW write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_RW_ATOMIC_REQ_sum` | Total number of RW atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_READ_REQ_sum` | Total number of UC read requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_WRITE_REQ_sum` | Total number of UC write requests to L2 cache, over all TCP instances. |
| `TCP_TCC_UC_ATOMIC_REQ_sum` | Total number of UC atomic requests to L2 cache, over all TCP instances. |
| `TCP_TCC_WRITE_REQ_LATENCY_sum` | Total vL1D to L2 request latency over all wavefronts for writes and atomics without return for all TCP instances. |
| `TCP_TCC_WRITE_REQ_sum` | Total number of write requests to L2 cache, over all TCP instances. |
| `TCP_TCP_LATENCY_sum` | Total wave access latency to vL1D over all wavefronts for all TCP instances. |
| `TCP_TCR_TCP_STALL_CYCLES_sum` | Total number of cycles TCR stalls vL1D, over all TCP instances. |
| `TCP_TD_TCP_STALL_CYCLES_sum` | Total number of cycles TD stalls vL1D, over all TCP instances. |
| `TCP_TOTAL_ACCESSES_sum` | Total number of vL1D accesses, over all TCP instances. |
| `TCP_TOTAL_READ_sum` | Total number of vL1D read accesses, over all TCP instances. |
| `TCP_TOTAL_WRITE_sum` | Total number of vL1D write accesses, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITH_RET_sum` | Total number of vL1D atomic requests with return, over all TCP instances. |
| `TCP_TOTAL_ATOMIC_WITHOUT_RET_sum` | Total number of vL1D atomic requests without return, over all TCP instances. |
| `TCP_TOTAL_CACHE_ACCESSES_sum` | Total number of vL1D cache accesses (including hits and misses) by all TCP instances. |
| `TCP_TOTAL_WRITEBACK_INVALIDATES_sum` | Total number of vL1D writebacks and invalidates, over all TCP instances. |
| `TCP_UTCL1_PERMISSION_MISS_sum` | Total number of UTCL1 permission misses by all TCP instances. |
| `TCP_UTCL1_REQUEST_sum` | Total number of address translation requests to UTCL1 by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_MISS_sum` | Total number of UTCL1 translation misses by all TCP instances. |
| `TCP_UTCL1_TRANSLATION_HIT_sum` | Total number of UTCL1 translation hits by all TCP instances. |
| `TCP_VOLATILE_sum` | Total number of L1 volatile pixels/buffers from TA, over all TCP instances. |
| `TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum` | Total number of cycles tagram conflict stalls on a write, over all TCP instances. |
| `TD_ATOMIC_WAVEFRONT_sum` | Total number of atomic wavefront instructions, over all TD instances. |
| `TD_COALESCABLE_WAVEFRONT_sum` | Total number of coalescable wavefronts according to TA, over all TD instances. |
| `TD_LOAD_WAVEFRONT_sum` | Total number of wavefront instructions (read/write/atomic), over all TD instances. |
| `TD_SPI_STALL_sum` | Total number of cycles TD is stalled by SPI, over all TD instances. |
| `TD_STORE_WAVEFRONT_sum` | Total number of write wavefront instructions, over all TD instances. |
| `TD_TC_STALL_sum` | Total number of cycles TD is stalled waiting for TC data, over all TD instances. |
| `TD_TD_BUSY_sum` | Total number of TD busy cycles while it is processing or waiting for data, over all TD instances. |
| `VALUBusy` | Percentage of GPU time vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal). |
| `VALUInsts` | Average number of vector ALU instructions executed per work item (affected by flow control). |
| `VALUUtilization` | Percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence). |
| `VFetchInsts` | Average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory. |
| `VWriteInsts` | Average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory. |
| `Wavefronts` | Total wavefronts. |
| `WRITE_REQ_32B` | Total number of 32-byte effective memory writes. |
| `WriteSize` | Total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account. |
| `WriteUnitStalled` | Percentage of GPU time the write unit is stalled. Value range: 0% to 100% (bad). |
| `FLAT` | FLAT instructions allow read/write/atomic access to a generic memory address pointer, which can resolve to any of the following physical memories:<br>. Global Memory<br>. Scratch ("private")<br>. LDS ("shared")<br>. Invalid - MEM_VIOL TrapStatus |
The microarchitecture of the AMD Instinct MI250 accelerators is based on the
The microarchitecture of the AMD Instinct MI250 GPU is based on the
AMD CDNA 2 architecture that targets compute applications such as HPC,
artificial intelligence (AI), and machine learning (ML) and that run on
everything from individual servers to the world’s largest exascale
@@ -33,13 +34,13 @@ Units (CU). The MI250 GCD has 104 active CUs. Each compute unit is further
subdivided into four SIMD units that process SIMD instructions of 16 data
elements per instruction (for the FP64 data type). This enables the CU to
process 64 work items (a so-called “wavefront”) at a peak clock frequency of 1.7
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 45.3
TFLOPS for vector instructions. The MI250 compute units also provide specialized
GHz. Therefore, the theoretical maximum FP64 peak performance per GCD is 22.6
TFLOPS for vector instructions. This equates to 45.3 TFLOPS for vector instructions for both GCDs together. The MI250 compute units also provide specialized
execution units (also called matrix cores), which are geared toward executing
matrix operations like matrix-matrix multiplications. For FP64, the peak
performance of these units amounts to 90.5 TFLOPS.


```{list-table} Peak-performance capabilities of the MI250 OAM for different data types.
:header-rows: 1
@@ -83,16 +84,9 @@ performance of these units amounts to 90.5 TFLOPS.
- 362.1
```
The above table summarizes the aggregated peak performance of the AMD
Instinct MI250 OCP Open Accelerator Modules (OAM, OCP is short for Open Compute
Platform) and its two GCDs for different data types and execution units. The
middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction
is being retired in each clock cycle. The third column lists the theoretical
peak performance of the OAM module. The theoretical aggregated peak memory
bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).
The above table summarizes the aggregated peak performance of the AMD Instinct MI250 Open Compute Platform (OCP) Open Accelerator Modules (OAMs) and its two GCDs for different data types and execution units. The middle column lists the peak performance (number of data elements processed in a single instruction) of a single compute unit if a SIMD (or matrix) instruction is being retired in each clock cycle. The third column lists the theoretical peak performance of the OAM module. The theoretical aggregated peak memory bandwidth of the GPU is 3.2 TB/sec (1.6 TB/sec per GCD).


The following image shows the block diagram of an OAM package that consists
of two GCDs, each of which constitutes one GPU device in the system. The two
@@ -104,18 +98,18 @@ between the two GCDs of an OAM, or a bidirectional peak transfer bandwidth of
## Node-level architecture
The following image shows the node-level architecture of a system that is
based on the AMD Instinct MI250 accelerator. The MI250 OAMs attach to the host
based on the AMD Instinct MI250 GPU. The MI250 OAMs attach to the host
system via PCIe Gen 4 x16 links (yellow lines). Each GCD maintains its own PCIe
x16 link to the host part of the system. Depending on the server platform, the
GCD can attach to the AMD EPYC processor directly or via an optional PCIe switch
. Note that some platforms may offer an x8 interface to the GCDs, which reduces
the available host-to-GPU bandwidth.


The preceding image shows the node-level architecture of a system with AMD
EPYC processors in a dual-socket configuration and four AMD Instinct MI250
accelerators. The MI250 OAMs attach to the host processors system via PCIe Gen 4
GPUs. The MI250 OAMs attach to the host processors system via PCIe Gen 4
x16 links (yellow lines). Depending on the system design, a PCIe switch may
exist to make more PCIe lanes available for additional components like network
interfaces and/or storage devices. Each GCD maintains its own PCIe x16 link to
"``CPF_CMP_UTCL1_STALL_ON_TRANSLATION``", "Cycles", "Number of cycles one of the compute unified translation caches (L1) is stalled waiting on translation"
"``CPF_CPF_STAT_BUSY``", "Cycles", "Number of cycles command processor-fetcher is busy"
"``CPF_CPF_STAT_IDLE``", "Cycles", "Number of cycles command processor-fetcher is idle"
"``CPF_CPF_STAT_STALL``", "Cycles", "Number of cycles command processor-fetcher is stalled"
"``CPF_CPF_TCIU_BUSY``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is busy"
"``CPF_CPF_TCIU_IDLE``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is idle"
"``CPF_CPF_TCIU_STALL``", "Cycles", "Number of cycles command processor-fetcher texture cache interface unit interface is stalled waiting on free tags"
The texture cache interface unit is the interface between the command processor and the memory
"``SPI_CSN_BUSY``", "Cycles", "Number of cycles with outstanding waves"
"``SPI_CSN_WINDOW_VALID``", "Cycles", "Number of cycles enabled by ``perfcounter_start`` event"
"``SPI_CSN_NUM_THREADGROUPS``", "Workgroups", "Number of dispatched workgroups"
"``SPI_CSN_WAVE``", "Wavefronts", "Number of dispatched wavefronts"
"``SPI_RA_REQ_NO_ALLOC``", "Cycles", "Number of arbiter cycles with requests but no allocation"
"``SPI_RA_REQ_NO_ALLOC_CSN``", "Cycles", "Number of arbiter cycles with compute shader (n\ :sup:`th` pipe) requests but no compute shader (n\ :sup:`th` pipe) allocation"
"``SPI_RA_RES_STALL_CSN``", "Cycles", "Number of arbiter stall cycles due to shortage of compute shader (n\ :sup:`th` pipe) pipeline slots"
"``SPI_RA_TMP_STALL_CSN``", "Cycles", "Number of stall cycles due to shortage of temp space"
"``SPI_RA_WAVE_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of single instruction, multiple data (SIMD) per cycle affected by shortage of wave slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_VGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of vector general-purpose register (VGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_SGPR_SIMD_FULL_CSN``", "SIMD-cycles", "Accumulated number of SIMDs per cycle affected by shortage of scalar general-purpose register (SGPR) slots for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_LDS_CU_FULL_CSN``", "CU", "Number of compute units affected by shortage of local data share (LDS) space for compute shader (n\ :sup:`th` pipe) wave dispatch"
"``SPI_RA_BAR_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting at a BARRIER"
"``SPI_RA_BULKY_CU_FULL_CSN``", "CU", "Number of compute units with compute shader (n\ :sup:`th` pipe) waves waiting for BULKY resource"
"``SPI_RA_TGLIM_CU_FULL_CSN``", "Cycles", "Number of compute shader (n\ :sup:`th` pipe) wave stall cycles due to restriction of ``tg_limit`` for thread group size"
"``SPI_RA_WVLIM_STALL_CSN``", "Cycles", "Number of cycles compute shader (n\ :sup:`th` pipe) is stalled due to ``WAVE_LIMIT``"
"``SPI_VWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize VGPRs when launching waves"
"``SPI_SWC_CSC_WR``", "Qcycles", "Number of quad-cycles taken to initialize SGPRs when launching waves"
"``SQ_INSTS_VMEM``", "Instr", "Number of vector memory instructions issued, including both flat and buffer instructions"
"``SQ_INSTS_SALU``", "Instr", "Number of scalar arithmetic logic unit (SALU) instructions issued"
"``SQ_INSTS_SMEM``", "Instr", "Number of scalar memory instructions issued"
"``SQ_INSTS_SMEM_NORM``", "Instr", "Number of scalar memory instructions normalized to match ``smem_level`` issued"
"``SQ_INSTS_FLAT``", "Instr", "Number of flat instructions issued"
"``SQ_INSTS_FLAT_LDS_ONLY``", "Instr", "**MI200 Series only** Number of FLAT instructions that read/write only from/to LDS issued. Works only if ``EARLY_TA_DONE`` is enabled."
"``SQ_INSTS_LDS``", "Instr", "Number of LDS instructions issued **(MI200: includes flat; MI300: does not include flat)**"
"``SQ_INSTS_GDS``", "Instr", "Number of global data share instructions issued"
"``SQ_INSTS_EXP_GDS``", "Instr", "Number of EXP and global data share instructions excluding skipped export instructions issued"
"``SQ_INSTS_BRANCH``", "Instr", "Number of branch instructions issued"
"``SQ_INSTS_SENDMSG``", "Instr", "Number of ``SENDMSG`` instructions including ``s_endpgm`` issued"
"``SQ_INSTS_VSKIPPED``", "Instr", "Number of vector instructions skipped"
Flat instructions allow read, write, and atomic access to a generic memory address pointer that can
resolve to any of the following physical memories:
"``SQ_BUSY_CYCLES``", "Cycles", "Number of cycles while sequencers reports it to be busy"
"``SQ_BUSY_CU_CYCLES``", "Qcycles", "Number of quad-cycles each compute unit is busy"
"``SQ_VALU_MFMA_BUSY_CYCLES``", "Cycles", "Number of cycles the matrix FMA arithmetic logic unit (ALU) is busy"
"``SQ_WAVE_CYCLES``", "Qcycles", "Number of quad-cycles spent by waves in the compute units"
"``SQ_WAIT_ANY``", "Qcycles", "Number of quad-cycles spent waiting for anything"
"``SQ_WAIT_INST_ANY``", "Qcycles", "Number of quad-cycles spent waiting for any instruction to be issued"
"``SQ_ACTIVE_INST_ANY``", "Qcycles", "Number of quad-cycles spent by each wave to work on an instruction"
"``SQ_ACTIVE_INST_VMEM``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a vector memory instruction"
"``SQ_ACTIVE_INST_LDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an LDS instruction"
"``SQ_ACTIVE_INST_VALU``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a VALU instruction"
"``SQ_ACTIVE_INST_SCA``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a SALU or scalar memory instruction"
"``SQ_ACTIVE_INST_EXP_GDS``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on an ``EXPORT`` or ``GDS`` instruction"
"``SQ_ACTIVE_INST_MISC``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a ``BRANCH`` or ``SENDMSG`` instruction"
"``SQ_ACTIVE_INST_FLAT``", "Qcycles", "Number of quad-cycles spent by the sequencer instruction arbiter to work on a flat instruction"
"``SQ_INST_CYCLES_VMEM_WR``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory write instructions"
"``SQ_INST_CYCLES_VMEM_RD``", "Qcycles", "Number of quad-cycles spent to send addr and cmd data for vector memory read instructions"
"``SQ_INST_CYCLES_SMEM``", "Qcycles", "Number of quad-cycles spent to execute scalar memory reads"
"``SQ_INST_CYCLES_SALU``", "Qcycles", "Number of quad-cycles spent to execute non-memory read scalar operations"
"``SQ_THREAD_CYCLES_VALU``", "Qcycles", "Number of quad-cycles spent to execute VALU operations on active threads"
"``SQ_WAIT_INST_LDS``", "Qcycles", "Number of quad-cycles spent waiting for LDS instruction to be issued"
``SQ_THREAD_CYCLES_VALU`` is similar to ``INST_CYCLES_VALU``, but it's multiplied by the number of
"``SQC_ICACHE_REQ``", "Req", "Number of L1 instruction (L1i) cache requests"
"``SQC_ICACHE_HITS``", "Count", "Number of L1i cache hits"
"``SQC_ICACHE_MISSES``", "Count", "Number of non-duplicate L1i cache misses including uncached requests"
"``SQC_ICACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate L1i cache misses whose previous lookup miss on the same cache line is not fulfilled yet"
"``SQC_DCACHE_REQ``", "Req", "Number of scalar L1d requests"
"``SQC_DCACHE_INPUT_VALID_READYB``", "Cycles", "Number of cycles while sequencer input is valid but scalar L1d is not ready"
"``SQC_DCACHE_HITS``", "Count", "Number of scalar L1d hits"
"``SQC_DCACHE_MISSES``", "Count", "Number of non-duplicate scalar L1d misses including uncached requests"
"``SQC_DCACHE_MISSES_DUPLICATE``", "Count", "Number of duplicate scalar L1d misses"
"``SQC_DCACHE_REQ_READ_1``", "Req", "Number of constant cache read requests in a single 32-bit data word"
"``SQC_DCACHE_REQ_READ_2``", "Req", "Number of constant cache read requests in two 32-bit data words"
"``SQC_DCACHE_REQ_READ_4``", "Req", "Number of constant cache read requests in four 32-bit data words"
"``SQC_DCACHE_REQ_READ_8``", "Req", "Number of constant cache read requests in eight 32-bit data words"
"``SQC_DCACHE_REQ_READ_16``", "Req", "Number of constant cache read requests in 16 32-bit data words"
"``SQC_DCACHE_ATOMIC``", "Req", "Number of atomic requests"
"``SQC_TC_REQ``", "Req", "Number of texture cache requests that were issued by instruction and constant caches"
"``SQC_TC_INST_REQ``", "Req", "Number of instruction requests to the L2 cache"
"``SQC_TC_DATA_READ_REQ``", "Req", "Number of data Read requests to the L2 cache"
"``SQC_TC_DATA_WRITE_REQ``", "Req", "Number of data write requests to the L2 cache"
"``SQC_TC_DATA_ATOMIC_REQ``", "Req", "Number of data atomic requests to the L2 cache"
"``SQC_TC_STALL``", "Cycles", "Number of cycles while the valid requests to the L2 cache are stalled"
"``TCP_TOTAL_CACHE_ACCESSES[n]``", "Req", "Number of vector L1d cache accesses including hits and misses", "0-15"
"``TCP_TCP_LATENCY[n]``", "Cycles", "**MI200 Series only** Accumulated wave access latency to vL1D over all wavefronts", "0-15"
"``TCP_TCC_READ_REQ_LATENCY[n]``", "Cycles", "**MI200 Series only** Total vL1D to L2 request latency over all wavefronts for reads and atomics with return", "0-15"
"``TCP_TCC_WRITE_REQ_LATENCY[n]``", "Cycles", "**MI200 Series only** Total vL1D to L2 request latency over all wavefronts for writes and atomics without return", "0-15"
"``TCP_TCC_READ_REQ[n]``", "Req", "Number of read requests to L2 cache", "0-15"
"``TCP_TCC_WRITE_REQ[n]``", "Req", "Number of write requests to L2 cache", "0-15"
"``TCP_TCC_ATOMIC_WITH_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache with return", "0-15"
"``TCP_TCC_ATOMIC_WITHOUT_RET_REQ[n]``", "Req", "Number of atomic requests to L2 cache without return", "0-15"
"``TCP_TCC_NC_READ_REQ[n]``", "Req", "Number of non-coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_UC_READ_REQ[n]``", "Req", "Number of uncached read requests to L2 cache", "0-15"
"``TCP_TCC_CC_READ_REQ[n]``", "Req", "Number of coherently cached read requests to L2 cache", "0-15"
"``TCP_TCC_RW_READ_REQ[n]``", "Req", "Number of coherently cached with write read requests to L2 cache", "0-15"
"``TCP_TCC_NC_WRITE_REQ[n]``", "Req", "Number of non-coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_UC_WRITE_REQ[n]``", "Req", "Number of uncached write requests to L2 cache", "0-15"
"``TCP_TCC_CC_WRITE_REQ[n]``", "Req", "Number of coherently cached write requests to L2 cache", "0-15"
"``TCP_TCC_RW_WRITE_REQ[n]``", "Req", "Number of coherently cached with write write requests to L2 cache", "0-15"
"``TCP_TCC_NC_ATOMIC_REQ[n]``", "Req", "Number of non-coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_UC_ATOMIC_REQ[n]``", "Req", "Number of uncached atomic requests to L2 cache", "0-15"
"``TCP_TCC_CC_ATOMIC_REQ[n]``", "Req", "Number of coherently cached atomic requests to L2 cache", "0-15"
"``TCP_TCC_RW_ATOMIC_REQ[n]``", "Req", "Number of coherently cached with write atomic requests to L2 cache", "0-15"
L2 cache is also known as texture cache per channel.
..tab-set::
..tab-item:: MI300 hardware counter
..csv-table::
:header:"Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA0_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA0_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_WR_UNCACHED_32B[n]``", "Req", "Number of 32 or 64-byte write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic", "0-31"
"``TCC_EA0_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA0_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA0_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA0_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA0_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA0_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA0_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA0_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA0_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA0_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA0_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
..tab-item:: MI200 hardware counter
..csv-table::
:header:"Hardware counter", "Unit", "Definition", "Value range for ``n``"
"``TCC_CYCLE[n]``", "Cycles", "Number of L2 cache free-running clocks", "0-31"
"``TCC_BUSY[n]``", "Cycles", "Number of L2 cache busy cycles", "0-31"
"``TCC_REQ[n]``", "Req", "Number of L2 cache requests of all types (measured at the tag block)", "0-31"
"``TCC_STREAMING_REQ[n]``", "Req", "Number of L2 cache streaming requests (measured at the tag block)", "0-31"
"``TCC_NC_REQ[n]``", "Req", "Number of non-coherently cached requests (measured at the tag block)", "0-31"
"``TCC_UC_REQ[n]``", "Req", "Number of uncached requests. This is measured at the tag block", "0-31"
"``TCC_CC_REQ[n]``", "Req", "Number of coherently cached requests. This is measured at the tag block", "0-31"
"``TCC_RW_REQ[n]``", "Req", "Number of coherently cached with write requests. This is measured at the tag block", "0-31"
"``TCC_PROBE[n]``", "Req", "Number of probe requests", "0-31"
"``TCC_PROBE_ALL[n]``", "Req", "Number of external probe requests with ``EA_TCC_preq_all == 1``", "0-31"
"``TCC_READ[n]``", "Req", "Number of L2 cache read requests (includes compressed reads but not metadata reads)", "0-31"
"``TCC_WRITE[n]``", "Req", "Number of L2 cache write requests", "0-31"
"``TCC_ATOMIC[n]``", "Req", "Number of L2 cache atomic requests of all types", "0-31"
"``TCC_HIT[n]``", "Req", "Number of L2 cache hits", "0-31"
"``TCC_MISS[n]``", "Req", "Number of L2 cache misses", "0-31"
"``TCC_WRITEBACK[n]``", "Req", "Number of lines written back to the main memory, including writebacks of dirty lines and uncached write or atomic requests", "0-31"
"``TCC_EA_WRREQ[n]``", "Req", "Number of 32-byte and 64-byte transactions going over the ``TC_EA_wrreq`` interface (doesn't include probe commands)", "0-31"
"``TCC_EA_WRREQ_64B[n]``", "Req", "Total number of 64-byte transactions (write or ``CMPSWAP``) going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_WR_UNCACHED_32B[n]``", "Req", "Number of 32 write or atomic going over the ``TC_EA_wrreq`` interface due to uncached traffic. A 64-byte request will be counted as 2", "0-31"
"``TCC_EA_WRREQ_STALL[n]``", "Cycles", "Number of cycles a write request is stalled", "0-31"
"``TCC_EA_WRREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of input-output (IO) credits", "0-31"
"``TCC_EA_WRREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits", "0-31"
"``TCC_EA_WRREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits", "0-31"
"``TCC_TOO_MANY_EA_WRREQS_STALL[n]``", "Cycles", "Number of cycles the L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests", "0-31"
"``TCC_EA_WRREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter write requests in flight", "0-31"
"``TCC_EA_ATOMIC[n]``", "Req", "Number of 32-byte or 64-byte atomic requests going over the ``TC_EA_wrreq`` interface", "0-31"
"``TCC_EA_ATOMIC_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter atomic requests in flight", "0-31"
"``TCC_EA_RDREQ[n]``", "Req", "Number of 32-byte or 64-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RDREQ_32B[n]``", "Req", "Number of 32-byte read requests to efficiency arbiter", "0-31"
"``TCC_EA_RD_UNCACHED_32B[n]``", "Req", "Number of 32-byte efficiency arbiter reads due to uncached traffic. A 64-byte request is counted as 2", "0-31"
"``TCC_EA_RDREQ_IO_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of IO credits", "0-31"
"``TCC_EA_RDREQ_GMI_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of GMI credits", "0-31"
"``TCC_EA_RDREQ_DRAM_CREDIT_STALL[n]``", "Cycles", "Number of cycles there is a stall due to the read request interface running out of DRAM credits", "0-31"
"``TCC_EA_RDREQ_LEVEL[n]``", "Req", "The accumulated number of efficiency arbiter read requests in flight", "0-31"
"``TCC_EA_RDREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter read requests to High Bandwidth Memory (HBM)", "0-31"
"``TCC_EA_WRREQ_DRAM[n]``", "Req", "Number of 32-byte or 64-byte efficiency arbiter write requests to HBM", "0-31"
"``TCC_TAG_STALL[n]``", "Cycles", "Number of cycles the normal request pipeline in the tag is stalled for any reason", "0-31"
"``TCC_NORMAL_WRITEBACK[n]``", "Req", "Number of writebacks due to requests that are not writeback requests", "0-31"
"``TCC_ALL_TC_OP_WB_WRITEBACK[n]``", "Req", "Number of writebacks due to all ``TC_OP`` writeback requests", "0-31"
"``TCC_NORMAL_EVICT[n]``", "Req", "Number of evictions due to requests that are not invalidate or probe requests", "0-31"
"``TCC_ALL_TC_OP_INV_EVICT[n]``", "Req", "Number of evictions due to all ``TC_OP`` invalidate requests", "0-31"
Note the following:
*``TCC_REQ[n]`` may be more than the number of requests arriving at the texture cache per channel,
but it's a good indication of the total amount of work that needs to be performed.
* For ``TCC_EA0_WRREQ[n]``, atomics may travel over the same interface and are generally classified as
write requests.
* CC mtypes can produce uncached requests, and those are included in
``TCC_EA0_WR_UNCACHED_32B[n]``
*``TCC_EA0_WRREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter write latency.
* Average write latency = ``TCC_PERF_SEL_EA0_WRREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ``
*``TCC_EA0_ATOMIC_LEVEL[n]`` is primarily intended to measure average efficiency arbiter atomic
latency
* Average atomic latency = ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC_LEVEL`` divided by ``TCC_PERF_SEL_EA0_WRREQ_ATOMIC``
*``TCC_EA0_RDREQ_LEVEL[n]`` is primarily intended to measure average efficiency arbiter read latency.
* Average read latency = ``TCC_PERF_SEL_EA0_RDREQ_LEVEL`` divided by ``TCC_PERF_SEL_EA0_RDREQ``
* Stalls can occur regardless of the need for a read to be performed
* Normally, stalls are measured exactly at one point in the pipeline however in the case of
``TCC_TAG_STALL[n]``, probes can stall the pipeline at a variety of places. There is no single point that
"``ALUStalledByLDS``", "Percentage of GPU time ALU units are stalled due to the LDS input queue being full or the output queue not being ready (value range: 0% (optimal) to 100%)"
"``FetchSize``", "Total kilobytes fetched from the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``FlatLDSInsts``", "Average number of flat instructions that read from or write to LDS, run per work item (affected by flow control)"
"``FlatVMemInsts``", "Average number of flat instructions that read from or write to the video memory, run per work item (affected by flow control). Includes flat instructions that read from or write to scratch"
"``GDSInsts``", "Average number of global data share read or write instructions run per work item (affected by flow control)"
"``GPUBusy``", "Percentage of time GPU is busy"
"``L2CacheHit``", "Percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache (value range: 0% (no hit) to 100% (optimal))"
"``LDSBankConflict``", "Percentage of GPU time LDS is stalled by bank conflicts (value range: 0% (optimal) to 100%)"
"``LDSInsts``", "Average number of LDS read or write instructions run per work item (affected by flow control). Excludes flat instructions that read from or write to LDS."
"``MemUnitBusy``", "Percentage of GPU time the memory unit is active, which is measured with all extra fetches and writes and any cache or memory effects taken into account (value range: 0% to 100% (fetch-bound))"
"``MemUnitStalled``", "Percentage of GPU time the memory unit is stalled (value range: 0% (optimal) to 100%)"
"``MemWrites32B``", "Total number of effective 32B write transactions to the memory"
"``TCA_BUSY_sum``", "Total number of cycles texture cache arbiter has a pending request, over all texture cache arbiter instances"
"``TCA_CYCLE_sum``", "Total number of cycles over all texture cache arbiter instances"
"``SALUBusy``", "Percentage of GPU time scalar ALU instructions are processed (value range: 0% to 100% (optimal))"
"``SALUInsts``", "Average number of scalar ALU instructions run per work item (affected by flow control)"
"``SFetchInsts``", "Average number of scalar fetch instructions from the video memory run per work item (affected by flow control)"
"``VALUBusy``", "Percentage of GPU time vector ALU instructions are processed (value range: 0% to 100% (optimal))"
"``VALUInsts``", "Average number of vector ALU instructions run per work item (affected by flow control)"
"``VALUUtilization``", "Percentage of active vector ALU threads in a wave, where a lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64 (value range: 0%, 100% (optimal - no thread divergence))"
"``VFetchInsts``", "Average number of vector fetch instructions from the video memory run per work-item (affected by flow control); excludes flat instructions that fetch from video memory"
"``VWriteInsts``", "Average number of vector write instructions to the video memory run per work-item (affected by flow control); excludes flat instructions that write to video memory"
"``Wavefronts``", "Total wavefronts"
"``WRITE_REQ_32B``", "Total number of 32-byte effective memory writes"
"``WriteSize``", "Total kilobytes written to the video memory; measured with all extra fetches and any cache or memory effects taken into account"
"``WriteUnitStalled``", "Percentage of GPU time the write unit is stalled (value range: 0% (optimal) to 100%)"
You can lower ``ALUStalledByLDS`` by reducing LDS bank conflicts or number of LDS accesses.
You can lower ``MemUnitStalled`` by reducing the number or size of fetches and writes.
``MemUnitBusy`` includes the stall time (``MemUnitStalled``).
Hardware counters by and over all texture addressing unit instances
"``TCC_ALL_TC_OP_WB_WRITEBACK_sum``", "Total number of writebacks due to all ``TC_OP`` writeback requests."
"``TCC_ALL_TC_OP_INV_EVICT_sum``", "Total number of evictions due to all ``TC_OP`` invalidate requests."
"``TCC_ATOMIC_sum``", "Total number of L2 cache atomic requests of all types."
"``TCC_BUSY_avr``", "Average number of L2 cache busy cycles."
"``TCC_BUSY_sum``", "Total number of L2 cache busy cycles."
"``TCC_CC_REQ_sum``", "Total number of coherently cached requests."
"``TCC_CYCLE_sum``", "Total number of L2 cache free running clocks."
"``TCC_EA0_WRREQ_sum``", "Total number of 32-byte and 64-byte transactions going over the ``TC_EA0_wrreq`` interface. Atomics may travel over the same interface and are generally classified as write requests. This does not include probe commands."
"``TCC_EA0_WRREQ_64B_sum``", "Total number of 64-byte transactions (write or `CMPSWAP`) going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_WR_UNCACHED_32B_sum``", "Total Number of 32-byte write or atomic going over the ``TC_EA0_wrreq`` interface due to uncached traffic. Note that coherently cached mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2."
"``TCC_EA0_WRREQ_STALL_sum``", "Total Number of cycles a write request is stalled, over all instances."
"``TCC_EA0_WRREQ_IO_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of IO credits, over all instances."
"``TCC_EA0_WRREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of GMI credits, over all instances."
"``TCC_EA0_WRREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles an efficiency arbiter write request is stalled due to the interface running out of DRAM credits, over all instances."
"``TCC_EA0_WRREQ_LEVEL_sum``", "Total number of efficiency arbiter write requests in flight."
"``TCC_EA0_RDREQ_LEVEL_sum``", "Total number of efficiency arbiter read requests in flight."
"``TCC_EA0_ATOMIC_sum``", "Total Number of 32-byte or 64-byte atomic requests going over the ``TC_EA0_wrreq`` interface."
"``TCC_EA0_ATOMIC_LEVEL_sum``", "Total number of efficiency arbiter atomic requests in flight."
"``TCC_EA0_RDREQ_sum``", "Total number of 32-byte or 64-byte read requests to efficiency arbiter."
"``TCC_EA0_RDREQ_32B_sum``", "Total number of 32-byte read requests to efficiency arbiter."
"``TCC_EA0_RD_UNCACHED_32B_sum``", "Total number of 32-byte efficiency arbiter reads due to uncached traffic."
"``TCC_EA0_RDREQ_IO_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of IO credits."
"``TCC_EA0_RDREQ_GMI_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of GMI credits."
"``TCC_EA0_RDREQ_DRAM_CREDIT_STALL_sum``", "Total number of cycles there is a stall due to the read request interface running out of DRAM credits."
"``TCC_EA0_RDREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter read requests to HBM."
"``TCC_EA0_WRREQ_DRAM_sum``", "Total number of 32-byte or 64-byte efficiency arbiter write requests to HBM."
"``TCC_HIT_sum``", "Total number of L2 cache hits."
"``TCC_MISS_sum``", "Total number of L2 cache misses."
"``TCC_NC_REQ_sum``", "Total number of non-coherently cached requests."
"``TCC_NORMAL_WRITEBACK_sum``", "Total number of writebacks due to requests that are not writeback requests."
"``TCC_NORMAL_EVICT_sum``", "Total number of evictions due to requests that are not invalidate or probe requests."
"``TCC_PROBE_sum``", "Total number of probe requests."
"``TCC_PROBE_ALL_sum``", "Total number of external probe requests with ``EA0_TCC_preq_all == 1``."
"``TCC_READ_sum``", "Total number of L2 cache read requests (including compressed reads but not metadata reads)."
"``TCC_REQ_sum``", "Total number of all types of L2 cache requests."
"``TCC_RW_REQ_sum``", "Total number of coherently cached with write requests."
"``TCC_STREAMING_REQ_sum``", "Total number of L2 cache streaming requests."
"``TCC_TAG_STALL_sum``", "Total number of cycles the normal request pipeline in the tag is stalled for any reason."
"``TCC_TOO_MANY_EA0_WRREQS_STALL_sum``", "Total number of cycles L2 cache is unable to send an efficiency arbiter write request due to it reaching its maximum capacity of pending efficiency arbiter write requests."
"``TCC_UC_REQ_sum``", "Total number of uncached requests."
"``TCC_WRITE_sum``", "Total number of L2 cache write requests."
"``TCC_WRITEBACK_sum``", "Total number of lines written back to the main memory including writebacks of dirty lines and uncached write or atomic requests."
"``TCC_WRREQ_STALL_max``", "Maximum number of cycles a write request is stalled."
Hardware counters by, for, or over all texture cache per pipe instances
The AMD Instinct MI300 Series GPUs are based on the AMD CDNA 3
architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine
learning (ML) workloads. The AMD Instinct MI300 Series GPUs are well-suited for extreme scalability and compute performance, running
on everything from individual servers to the world’s largest exascale supercomputers.
With the MI300 Series, AMD is introducing the Accelerator Complex Die (XCD), which contains the
GPU computational elements of the processor along with the lower levels of the cache hierarchy.
The following image depicts the structure of a single XCD in the AMD Instinct MI300 GPU Series.
```{figure} ../../data/shared/xcd-sys-arch.png
---
name: mi300-xcd
align: center
---
XCD-level system architecture showing 40 Compute Units, each with 32 KB L1 cache, a Unified Compute System with 4 ACE Compute Accelerators, shared 4MB of L2 cache and an HWS Hardware Scheduler.
```
On the XCD, four Asynchronous Compute Engines (ACEs) send compute shader workgroups to the
Compute Units (CUs). The XCD has 40 CUs: 38 active CUs at the aggregate level and 2 disabled CUs for
yield management. The CUs all share a 4 MB L2 cache that serves to coalesce all memory traffic for the
die. With less than half of the CUs of the AMD Instinct MI200 Series compute die, the AMD CDNA™ 3
XCD die is a smaller building block. However, it uses more advanced packaging and the processor
can include 6 or 8 XCDs for up to 304 CUs, roughly 40% more than MI250X.
The MI300 Series integrate up to 8 vertically stacked XCDs, 8 stacks of
High-Bandwidth Memory 3 (HBM3) and 4 I/O dies (containing system
infrastructure) using the AMD Infinity Fabric™ technology as interconnect.
The Matrix Cores inside the CDNA 3 CUs have significant improvements, emphasizing AI and machine
learning, enhancing throughput of existing data types while adding support for new data types.
CDNA 2 Matrix Cores support FP16 and BF16, while offering INT8 for inference. Compared to MI250X
GPUs, CDNA 3 Matrix Cores triple the performance for FP16 and BF16, while providing a
performance gain of 6.8 times for INT8. FP8 has a performance gain of 16 times compared to FP32,
while TF32 has a gain of 4 times compared to FP32.
```{list-table} Peak-performance capabilities of the MI300X for different data types.
:header-rows: 1
:name: mi300x-perf-table
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
*
- Matrix FP64
- 256
- 163.4
*
- Vector FP64
- 128
- 81.7
*
- Matrix FP32
- 256
- 163.4
*
- Vector FP32
- 256
- 163.4
*
- Vector TF32
- 1024
- 653.7
*
- Matrix FP16
- 2048
- 1307.4
*
- Matrix BF16
- 2048
- 1307.4
*
- Matrix FP8
- 4096
- 2614.9
*
- Matrix INT8
- 4096
- 2614.9
```
The above table summarizes the aggregated peak performance of the AMD Instinct MI300X Open
Compute Platform (OCP) Open Accelerator Modules (OAMs) for different data types and command
processors. The middle column lists the peak performance (number of data elements processed in a
single instruction) of a single compute unit if a SIMD (or matrix) instruction is submitted in each clock
cycle. The third column lists the theoretical peak performance of the OAM. The theoretical aggregated
peak memory bandwidth of the GPU is 5.3 TB per second.
The following image shows the block diagram of the APU (left) and the OAM package (right) both
connected via AMD Infinity Fabric™ network on-chip.
:description:MI355 Series performance counters and metrics
:keywords:MI355, MI355X, MI3XX
***********************************
MI350 Series performance counters
***********************************
This topic lists and describes the hardware performance counters and derived metrics available on the AMD Instinct MI350 and MI355 GPUs. These counters are available for profiling using `ROCprofiler-SDK <https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/index.html>`_ and `ROCm Compute Profiler <https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/>`_.
The following sections list the performance counters based on the IP blocks.
- ADC valid chunk is not available when dispatch walking is in progress in the multi-xcc mode.
* - CPC_ADC_DISPATCH_ALLOC_DONE
- ADC dispatch allocation is done.
* - CPC_ADC_VALID_CHUNK_END
- ADC crawler's valid chunk end in the multi-xcc mode.
* - CPC_SYNC_FIFO_FULL_LEVEL
- SYNC FIFO full last cycles.
* - CPC_SYNC_FIFO_FULL
- SYNC FIFO full times.
* - CPC_GD_BUSY
- ADC busy.
* - CPC_TG_SEND
- ADC thread group send.
* - CPC_WALK_NEXT_CHUNK
- ADC walking next valid chunk in the multi-xcc mode.
* - CPC_STALLED_BY_SE0_SPI
- ADC CSDATA stalled by SE0SPI.
* - CPC_STALLED_BY_SE1_SPI
- ADC CSDATA stalled by SE1SPI.
* - CPC_STALLED_BY_SE2_SPI
- ADC CSDATA stalled by SE2SPI.
* - CPC_STALLED_BY_SE3_SPI
- ADC CSDATA stalled by SE3SPI.
* - CPC_LTE_ALL
- CPC sync counter LteAll. Only Master XCD manages LteAll.
* - CPC_SYNC_WRREQ_FIFO_BUSY
- CPC sync counter request FIFO is not empty.
* - CPC_CANE_BUSY
- CPC CANE bus is busy, which indicates the presence of inflight sync counter requests.
* - CPC_CANE_STALL
- CPC sync counter sending is stalled by CANE.
Shader pipe interpolators (SPI) counters
=========================================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - SPI_CS0_WINDOW_VALID
- Clock count enabled by PIPE0 perfcounter_start event.
* - SPI_CS0_BUSY
- Number of clocks with outstanding waves for PIPE0 (SPI or SH).
* - SPI_CS0_NUM_THREADGROUPS
- Number of thread groups launched for PIPE0.
* - SPI_CS0_CRAWLER_STALL
- Number of clocks when PIPE0 event or wave order FIFO is full.
* - SPI_CS0_EVENT_WAVE
- Number of PIPE0 events and waves.
* - SPI_CS0_WAVE
- Number of PIPE0 waves.
* - SPI_CS1_WINDOW_VALID
- Clock count enabled by PIPE1 perfcounter_start event.
* - SPI_CS1_BUSY
- Number of clocks with outstanding waves for PIPE1 (SPI or SH).
* - SPI_CS1_NUM_THREADGROUPS
- Number of thread groups launched for PIPE1.
* - SPI_CS1_CRAWLER_STALL
- Number of clocks when PIPE1 event or wave order FIFO is full.
* - SPI_CS1_EVENT_WAVE
- Number of PIPE1 events and waves.
* - SPI_CS1_WAVE
- Number of PIPE1 waves.
* - SPI_CS2_WINDOW_VALID
- Clock count enabled by PIPE2 perfcounter_start event.
* - SPI_CS2_BUSY
- Number of clocks with outstanding waves for PIPE2 (SPI or SH).
* - SPI_CS2_NUM_THREADGROUPS
- Number of thread groups launched for PIPE2.
* - SPI_CS2_CRAWLER_STALL
- Number of clocks when PIPE2 event or wave order FIFO is full.
* - SPI_CS2_EVENT_WAVE
- Number of PIPE2 events and waves.
* - SPI_CS2_WAVE
- Number of PIPE2 waves.
* - SPI_CS3_WINDOW_VALID
- Clock count enabled by PIPE3 perfcounter_start event.
* - SPI_CS3_BUSY
- Number of clocks with outstanding waves for PIPE3 (SPI or SH).
* - SPI_CS3_NUM_THREADGROUPS
- Number of thread groups launched for PIPE3.
* - SPI_CS3_CRAWLER_STALL
- Number of clocks when PIPE3 event or wave order FIFO is full.
* - SPI_CS3_EVENT_WAVE
- Number of PIPE3 events and waves.
* - SPI_CS3_WAVE
- Number of PIPE3 waves.
* - SPI_CSQ_P0_Q0_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue0.
* - SPI_CSQ_P0_Q1_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue1.
* - SPI_CSQ_P0_Q2_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue2.
* - SPI_CSQ_P0_Q3_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue3.
* - SPI_CSQ_P0_Q4_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue4.
* - SPI_CSQ_P0_Q5_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue5.
* - SPI_CSQ_P0_Q6_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue6.
* - SPI_CSQ_P0_Q7_OCCUPANCY
- Sum of occupancy info for PIPE0 Queue7.
* - SPI_CSQ_P1_Q0_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue0.
* - SPI_CSQ_P1_Q1_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue1.
* - SPI_CSQ_P1_Q2_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue2.
* - SPI_CSQ_P1_Q3_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue3.
* - SPI_CSQ_P1_Q4_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue4.
* - SPI_CSQ_P1_Q5_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue5.
* - SPI_CSQ_P1_Q6_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue6.
* - SPI_CSQ_P1_Q7_OCCUPANCY
- Sum of occupancy info for PIPE1 Queue7.
* - SPI_CSQ_P2_Q0_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue0.
* - SPI_CSQ_P2_Q1_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue1.
* - SPI_CSQ_P2_Q2_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue2.
* - SPI_CSQ_P2_Q3_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue3.
* - SPI_CSQ_P2_Q4_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue4.
* - SPI_CSQ_P2_Q5_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue5.
* - SPI_CSQ_P2_Q6_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue6.
* - SPI_CSQ_P2_Q7_OCCUPANCY
- Sum of occupancy info for PIPE2 Queue7.
* - SPI_CSQ_P3_Q0_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue0.
* - SPI_CSQ_P3_Q1_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue1.
* - SPI_CSQ_P3_Q2_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue2.
* - SPI_CSQ_P3_Q3_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue3.
* - SPI_CSQ_P3_Q4_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue4.
* - SPI_CSQ_P3_Q5_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue5.
* - SPI_CSQ_P3_Q6_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue6.
* - SPI_CSQ_P3_Q7_OCCUPANCY
- Sum of occupancy info for PIPE3 Queue7.
* - SPI_CSQ_P0_OCCUPANCY
- Sum of occupancy info for all PIPE0 queues.
* - SPI_CSQ_P1_OCCUPANCY
- Sum of occupancy info for all PIPE1 queues.
* - SPI_CSQ_P2_OCCUPANCY
- Sum of occupancy info for all PIPE2 queues.
* - SPI_CSQ_P3_OCCUPANCY
- Sum of occupancy info for all PIPE3 queues.
* - SPI_VWC0_VDATA_VALID_WR
- Number of clocks VGPR bus_0 writes VGPRs.
* - SPI_VWC1_VDATA_VALID_WR
- Number of clocks VGPR bus_1 writes VGPRs.
* - SPI_CSC_WAVE_CNT_BUSY
- Number of cycles when there is any wave in the pipe.
Compute unit (SQ) counters
===========================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - SQ_INSTS_VALU_MFMA_F6F4
- Number of VALU V_MFMA_*_F6F4 instructions.
* - SQ_INSTS_VALU_MFMA_MOPS_F6F4
- Number of VALU matrix with the performed math operations (add or mul) divided by 512, assuming a full EXEC mask of F6 or F4 data type.
* - SQ_ACTIVE_INST_VALU2
- Number of quad-cycles when two VALU instructions are issued (per-simd, nondeterministic).
* - SQ_INSTS_LDS_LOAD
- Number of LDS load instructions issued (per-simd, emulated).
* - SQ_INSTS_LDS_STORE
- Number of LDS store instructions issued (per-simd, emulated).
* - SQ_INSTS_LDS_ATOMIC
- Number of LDS atomic instructions issued (per-simd, emulated).
* - SQ_INSTS_LDS_LOAD_BANDWIDTH
- Total number of 64-bytes loaded (instrSize * CountOnes(EXEC))/64 (per-simd, emulated).
* - SQ_INSTS_LDS_STORE_BANDWIDTH
- Total number of 64-bytes written (instrSize * CountOnes(EXEC))/64 (per-simd, emulated).
* - SQ_INSTS_LDS_ATOMIC_BANDWIDTH
- Total number of 64-bytes atomic (instrSize * CountOnes(EXEC))/64 (per-simd, emulated).
* - SQ_INSTS_VALU_FLOPS_FP16
- Counts FLOPS per instruction on float 16 excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_FLOPS_FP32
- Counts FLOPS per instruction on float 32 excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_FLOPS_FP64
- Counts FLOPS per instruction on float 64 excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_FLOPS_FP16_TRANS
- Counts FLOPS per instruction on float 16 trans excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_FLOPS_FP32_TRANS
- Counts FLOPS per instruction on float 32 trans excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_FLOPS_FP64_TRANS
- Counts FLOPS per instruction on float 64 trans excluding MFMA/SMFMA.
* - SQ_INSTS_VALU_IOPS
- Counts OPS per instruction on integer or unsigned or bit data (per-simd, emulated).
* - SQ_LDS_DATA_FIFO_FULL
- Number of cycles LDS data FIFO is full (nondeterministic, unwindowed).
* - SQ_LDS_CMD_FIFO_FULL
- Number of cycles LDS command FIFO is full (nondeterministic, unwindowed).
* - SQ_VMEM_TA_ADDR_FIFO_FULL
- Number of cycles texture requests are stalled due to full address FIFO in TA (nondeterministic, unwindowed).
* - SQ_VMEM_TA_CMD_FIFO_FULL
- Number of cycles texture requests are stalled due to full cmd FIFO in TA (nondeterministic, unwindowed).
* - SQ_VMEM_WR_TA_DATA_FIFO_FULL
- Number of cycles texture writes are stalled due to full data FIFO in TA (nondeterministic, unwindowed).
* - SQC_ICACHE_MISSES_DUPLICATE
- Number of duplicate misses (access to a non-resident, miss pending CL) (per-SQ, per-Bank, nondeterministic).
* - SQC_DCACHE_MISSES_DUPLICATE
- Number of duplicate misses (access to a non-resident, miss pending CL) (per-SQ, per-Bank, nondeterministic).
Texture addressing (TA) unit counters
======================================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - TA_BUFFER_READ_LDS_WAVEFRONTS
- Number of buffer read wavefronts for LDS return processed by the TA.
* - TA_FLAT_READ_LDS_WAVEFRONTS
- Number of flat opcode reads for LDS return processed by the TA.
Texture data (TD) unit counters
================================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - TD_WRITE_ACKT_WAVEFRONT
- Number of write acknowledgments, sent to SQ and not to SP.
* - TD_TD_SP_TRAFFIC
- Number of times this TD sends data to the SP.
Texture cache per pipe (TCP) counters
======================================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - TCP_TCP_TA_ADDR_STALL_CYCLES
- TCP stalls TA addr interface.
* - TCP_TCP_TA_DATA_STALL_CYCLES
- TCP stalls TA data interface. Now windowed.
* - TCP_LFIFO_STALL_CYCLES
- Memory latency FIFOs full stall.
* - TCP_RFIFO_STALL_CYCLES
- Memory Request FIFOs full stall.
* - TCP_TCR_RDRET_STALL
- Write into cache stalled by read return from TCR.
* - TCP_PENDING_STALL_CYCLES
- Stall due to data pending from L2.
* - TCP_UTCL1_SERIALIZATION_STALL
- Total number of stalls caused due to serializing translation requests through the UTCL1.
* - TCP_UTCL1_THRASHING_STALL
- Stall caused by thrashing feature in any probe. Lacks accuracy when the stall signal overlaps between probe0 and probe1, which is worse with MECO of thrashing deadlock. Some probe0 events could miss being counted in with MECO on. This perf count provides a rough thrashing estimate.
* - TCP_UTCL1_TRANSLATION_MISS_UNDER_MISS
- Translation miss_under_miss.
* - TCP_UTCL1_STALL_INFLIGHT_MAX
- Total UTCL1 stalls due to inflight counter saturation.
* - TCP_UTCL1_STALL_LRU_INFLIGHT
- Total UTCL1 stalls due to LRU cache line with inflight traffic.
* - TCP_UTCL1_STALL_MULTI_MISS
- Total UTCL1 stalls due to arbitrated multiple misses.
* - TCP_UTCL1_LFIFO_FULL
- Total UTCL1 and UTCL2 latency, which hides FIFO full cycles.
* - TCP_UTCL1_STALL_LFIFO_NOT_RES
- Total UTCL1 stalls due to UTCL2 latency, which hides FIFO output (not resident).
* - TCP_UTCL1_STALL_UTCL2_REQ_OUT_OF_CREDITS
- Total UTCL1 stalls due to UTCL2_req being out of credits.
* - TCP_CLIENT_UTCL1_INFLIGHT
- The sum of inflight client to UTCL1 requests per cycle.
* - TCP_TAGRAM0_REQ
- Total L2 requests mapping to TagRAM 0 from this TCP to all TCCs.
* - TCP_TAGRAM1_REQ
- Total L2 requests mapping to TagRAM 1 from this TCP to all TCCs.
* - TCP_TAGRAM2_REQ
- Total L2 requests mapping to TagRAM 2 from this TCP to all TCCs.
* - TCP_TAGRAM3_REQ
- Total L2 requests mapping to TagRAM 3 from this TCP to all TCCs.
* - TCP_TCP_LATENCY
- Total TCP wave latency (from the first clock of wave entering to the first clock of wave leaving). Divide by TA_TCP_STATE_READ to find average wave latency.
* - TCP_TCC_READ_REQ_LATENCY
- Total TCP to TCC request latency for reads and atomics with return. Not Windowed.
* - TCP_TCC_WRITE_REQ_LATENCY
- Total TCP to TCC request latency for writes and atomics without return. Not Windowed.
* - TCP_TCC_WRITE_REQ_HOLE_LATENCY
- Total TCP req to TCC hole latency for writes and atomics. Not Windowed.
Texture cache per channel (TCC) counters
=========================================
..list-table::
:header-rows:1
* - Hardware counter
- Definition
* - TCC_READ_SECTORS
- Total number of 32B data sectors in read requests.
* - TCC_WRITE_SECTORS
- Total number of 32B data sectors in write requests.
* - TCC_ATOMIC_SECTORS
- Total number of 32B data sectors in atomic requests.
* - TCC_BYPASS_REQ
- Number of bypass requests. This is measured at the tag block.
* - TCC_LATENCY_FIFO_FULL
- Number of cycles when the latency FIFO is full.
* - TCC_SRC_FIFO_FULL
- Number of cycles when the SRC FIFO is assumed to be full as measured at the IB block.
* - TCC_EA0_RDREQ_64B
- Number of 64-byte TCC/EA read requests.
* - TCC_EA0_RDREQ_128B
- Number of 128-byte TCC/EA read requests.
* - TCC_IB_REQ
- Number of requests through the IB. This measures the number of raw requests from graphics clients to this TCC.
* - TCC_IB_STALL
- Number of cycles when the IB output is stalled.
* - TCC_EA0_WRREQ_WRITE_DRAM
- Number of TCC/EA write requests (32-byte or 64-byte) destined for DRAM (MC).
* - TCC_EA0_WRREQ_ATOMIC_DRAM
- Number of TCC/EA atomic requests (32-byte or 64-byte) destined for DRAM (MC).
* - TCC_EA0_RDREQ_DRAM_32B
- Number of 32-byte TCC/EA read requests due to DRAM traffic. One 64-byte request is counted as two and one 128-byte as four.
* - TCC_EA0_RDREQ_GMI_32B
- Number of 32-byte TCC/EA read requests due to GMI traffic. One 64-byte request is counted as two and one 128-byte as four.
* - TCC_EA0_RDREQ_IO_32B
- Number of 32-byte TCC/EA read requests due to IO traffic. One 64-byte request is counted as two and one 128-byte as four.
* - TCC_EA0_WRREQ_WRITE_DRAM_32B
- Number of 32-byte TCC/EA write requests due to DRAM traffic. One 64-byte request is counted as two.
* - TCC_EA0_WRREQ_ATOMIC_DRAM_32B
- Number of 32-byte TCC/EA atomic requests due to DRAM traffic. One 64-byte request is counted as two.
* - TCC_EA0_WRREQ_WRITE_GMI_32B
- Number of 32-byte TCC/EA write requests due to GMI traffic. One 64-byte request is counted as two.
* - TCC_EA0_WRREQ_ATOMIC_GMI_32B
- Number of 32-byte TCC/EA atomic requests due to GMI traffic. One 64-byte request is counted as two.
* - TCC_EA0_WRREQ_WRITE_IO_32B
- Number of 32-byte TCC/EA write requests due to IO traffic. One 64-byte request is counted as two.
* - TCC_EA0_WRREQ_ATOMIC_IO_32B
- Number of 32-byte TCC/EA atomic requests due to IO traffic. One 64-byte request is counted as two.
Host memory exists on the host (e.g. CPU) of the machine in random access memory (RAM).
Device memory exists on the device (e.g. GPU) of the machine in video random access memory (VRAM).
Recent architectures use graphics double data rate (GDDR) synchronous dynamic random-access memory (SDRAM)such as GDDR6, or high-bandwidth memory (HBM) such as HBM2e.
## Memory allocation
Memory can be allocated in two ways: pageable memory, and pinned memory.
The following API calls with result in these allocations:
`hipMalloc` and `hipFree` are blocking calls, however, HIP recently added non-blocking versions `hipMallocAsync` and `hipFreeAsync` which take in a stream as an additional argument.
:::
### Pageable memory
Pageable memory is usually gotten when calling `malloc` or `new` in a C++ application.
It is unique in that it exists on "pages" (blocks of memory), which can be migrated to other memory storage.
For example, migrating memory between CPU sockets on a motherboard, or a system that runs out of space in RAM and starts dumping pages of RAM into the swap partition of your hard drive.
### Pinned memory
Pinned memory (or page-locked memory, or non-pageable memory) is host memory that is mapped into the address space of all GPUs, meaning that the pointer can be used on both host and device.
Accessing host-resident pinned memory in device kernels is generally not recommended for performance, as it can force the data to traverse the host-device interconnect (e.g. PCIe), which is much slower than the on-device bandwidth (>40x on MI200).
Pinned host memory can be allocated with one of two types of coherence support:
:::{note}
In HIP, pinned memory allocations are coherent by default (`hipHostMallocDefault`).
There are additional pinned memory flags (e.g. `hipHostMallocMapped` and `hipHostMallocPortable`).
On MI200 these options do not impact performance.
<!-- TODO: link to programming_manual#memory-allocation-flags -->
For more information, see the section *memory allocation flags* in the HIP Programming Guide: {doc}`hip:user_guide/programming_manual`.
:::
Much like how a process can be locked to a CPU core by setting affinity, a pinned memory allocator does this with the memory storage system.
On multi-socket systems it is important to ensure that pinned memory is located on the same socket as the owning process, or else each cache line will be moved through the CPU-CPU interconnect, thereby increasing latency and potentially decreasing bandwidth.
In practice, pinned memory is used to improve transfer times between host and device.
For transfer operations, such as `hipMemcpy` or `hipMemcpyAsync`, using pinned memory instead of pageable memory on host can lead to a ~3x improvement in bandwidth.
:::{tip}
If the application needs to move data back and forth between device and host (separate allocations), use pinned memory on the host side.
:::
### Managed memory
Managed memory refers to universally addressable, or unified memory available on the MI200 series of GPUs.
Much like pinned memory, managed memory shares a pointer between host and device and (by default) supports fine-grained coherence, however, managed memory can also automatically migrate pages between host and device.
The allocation will be managed by AMD GPU driver using the Linux HMM (Heterogeneous Memory Management) mechanism.
If heterogenous memory management (HMM) is not available, then `hipMallocManaged` will default back to using system memory and will act like pinned host memory.
Other managed memory API calls will have undefined behavior.
It is therefore recommended to check for managed memory capability with: `hipDeviceGetAttribute` and `hipDeviceAttributeManagedMemory`.
HIP supports additional calls that work with page migration:
*`hipMemAdvise`
*`hipMemPrefetchAsync`
:::{tip}
If the application needs to use data on both host and device regularly, does not want to deal with separate allocations, and is not worried about maxing out the VRAM on MI200 GPUs (64 GB per GCD), use managed memory.
:::
:::{tip}
If managed memory performance is poor, check to see if managed memory is supported on your system and if page migration (XNACK) is enabled.
:::
## Access behavior
Memory allocations for GPUs behave as follow:
| API | Data location | Host access | Device access |
| System allocated | Host | Local access | Unhandled page fault |
| `hipMallocManaged` | Host | Local access | Zero-copy |
| `hipHostMalloc` | Host | Local access | Zero-copy* |
| `hipMalloc` | Device | Zero-copy | Local access |
Zero-copy accesses happen over the Infinity Fabric interconnect or PCI-E lanes on discrete GPUs.
:::{note}
While `hipHostMalloc` allocated memory is accessible by a device, the host pointer must be converted to a device pointer with `hipHostGetDevicePointer`.
Memory allocated through standard system allocators such as `malloc`, can be accessed a device by registering the memory via `hipHostRegister`.
The device pointer to be used in kernels can be retrieved with `hipHostGetDevicePointer`.
Registered memory is treated like `hipHostMalloc` and will have similar performance.
On devices that support and have [](#xnack) enabled, such as the MI250X, `hipHostRegister` is not required as memory accesses are handled via automatic page migration.
:::
### XNACK
Normally, host and device memory are separate and data has to be transferred manually via `hipMemcpy`.
On a subset of GPUs, such as the MI200, there is an option to automatically migrate pages of memory between host and device.
This is important for managed memory, where the locality of the data is important for performance.
Depending on the system, page migration may be disabled by default in which case managed memory will act like pinned host memory and suffer degraded performance.
*XNACK* describes the GPUs ability to retry memory accesses that failed due a page fault (which normally would lead to a memory access error), and instead retrieve the missing page.
This also affects memory allocated by the system as indicated by the following table:
| API | Data location | Host after device access | Device after host access |
| System allocated | Host | Migrate page to host | Migrate page to device |
| `hipMallocManaged` | Host | Migrate page to host | Migrate page to device |
| `hipHostMalloc` | Host | Local access | Zero-copy |
| `hipMalloc` | Device | Zero-copy | Local access |
To check if page migration is available on a platform, use `rocminfo`:
```sh
$ rocminfo | grep xnack
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
```
Here, `xnack-` means that XNACK is available but is disabled by default.
Turning on XNACK by setting the environment variable `HSA_XNACK=1` and gives the expected result, `xnack+`:
```sh
$ HSA_XNACK=1 rocminfo | grep xnack
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack+
```
`hipcc`by default will generate code that runs correctly with both XNACK enabled or disabled.
Setting the `--offload-arch=`-option with `xnack+` or `xnack-` forces code to be only run with XNACK enabled or disabled respectively.
```sh
# Compiled kernels will run regardless if XNACK is enabled or is disabled.
hipcc --offload-arch=gfx90a
# Compiled kernels will only be run if XNACK is enabled with XNACK=1.
hipcc --offload-arch=gfx90a:xnack+
# Compiled kernels will only be run if XNACK is disabled with XNACK=0.
hipcc --offload-arch=gfx90a:xnack-
```
:::{tip}
If you want to make use of page migration, use managed memory. While pageable memory will migrate correctly, it is not a portable solution and can have performance issues if the accessed data isn't page aligned.
:::
### Coherence
* *Coarse-grained coherence* means that memory is only considered up to date at kernel boundaries, which can be enforced through `hipDeviceSynchronize`, `hipStreamSynchronize`, or any blocking operation that acts on the null stream (e.g. `hipMemcpy`).
For example, cacheable memory is a type of coarse-grained memory where an up-to-date copy of the data can be stored elsewhere (e.g. in an L2 cache).
* *Fine-grained coherence* means the coherence is supported while a CPU/GPU kernel is running.
This can be useful if both host and device are operating on the same dataspace using system-scope atomic operations (e.g. updating an error code or flag to a buffer).
Fine-grained memory implies that up-to-date data may be made visible to others regardless of kernel boundaries as discussed above.
Try to design your algorithms to avoid host-device memory coherence (e.g. system scope atomics). While it can be a useful feature in very specific cases, it is not supported on all systems, and can negatively impact performance by introducing the host-device interconnect bottleneck.
:::
The availability of fine- and coarse-grained memory pools can be checked with `rocminfo`:
```sh
$ rocminfo
...
*******
Agent 1
*******
Name: AMD EPYC 7742 64-Core Processor
...
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
...
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
...
*******
Agent 9
*******
Name: gfx90a
...
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
...
```
## System direct memory access
In most cases, the default behavior for HIP in transferring data from a pinned host allocation to device will run at the limit of the interconnect.
However, there are certain cases where the interconnect is not the bottleneck.
The primary way to transfer data onto and off of a GPU, such as the MI200, is to use the onboard System Direct Memory Access engine, which is used to feed blocks of memory to the off-device interconnect (either GPU-CPU or GPU-GPU).
Each GCD has a separate SDMA engine for host-to-device and device-to-host memory transfers.
Importantly, SDMA engines are separate from the computing infrastructure, meaning that memory transfers to and from a device will not impact kernel compute performance, though they do impact memory bandwidth to a limited extent.
The SDMA engines are mainly tuned for PCIe-4.0 x16, which means they are designed to operate at bandwidths up to 32 GB/s.
:::{note}
An important feature of the MI250X platform is the Infinity Fabric™ interconnect between host and device.
The Infinity Fabric interconnect supports improved performance over standard PCIe-4.0 (usually ~50% more bandwidth); however, since the SDMA engine does not run at this speed, it will not max out the bandwidth of the faster interconnect.
:::
The bandwidth limitation can be countered by bypassing the SDMA engine and replacing it with a type of copy kernel known as a "blit" kernel.
Blit kernels will use the compute units on the GPU, thereby consuming compute resources, which may not always be beneficial.
The easiest way to enable blit kernels is to set an environment variable `HSA_ENABLE_SDMA=0`, which will disable the SDMA engine.
On systems where the GPU uses a PCIe interconnect instead of an Infinity Fabric interconnect, blit kernels will not impact bandwidth, but will still consume compute resources.
The use of SDMA vs blit kernels also applies to MPI data transfers and GPU-GPU transfers.
The LLVM AddressSanitizer (ASan) provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications exactly like pure CPU applications. However, this simplicity has not been achieved yet.
This document provides documentation on using ROCm ASan.
For information about LLVM ASan, see the [LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
:::{note}
The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
:::
## Compiling for ASan
The ASan process begins by compiling the application of interest with the ASan instrumentation.
Recommendations for doing this are:
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
* Add the following options to the existing compiler and linker options:
*`-fsanitize=address` - enables instrumentation
*`-shared-libsan` - use shared version of runtime
*`-g` - add debug info for improved reporting
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
### About compilation time
When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.
There are a few options if the compile time becomes unacceptable:
* Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the ASan process.
* Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
* Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
## Installing ROCm GPU ASan packages
For a complete ROCm GPU Sanitizer installation, including packages, instrumented HSA and HIP runtimes, tools, and math libraries, use the following instruction,
```bash
sudo apt-get install rocm-ml-sdk-asan
```
## Using AMD-supplied ASan instrumented libraries
ROCm releases have optional packages that contain additional ASan instrumented builds of the ROCm libraries (usually found in `/opt/rocm-<version>/lib`). The instrumented libraries have identical names to the regular uninstrumented libraries, and are located in `/opt/rocm-<version>/lib/asan`.
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L``/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.
## Running ASan instrumented applications
### Preparing to run an instrumented application
Here are a few recommendations to consider before running an ASan instrumented heterogeneous application.
* Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
* Ensure XNACK is enabled
* For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
* For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
* Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
* Ensure that the application depends on the ASan runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
If it does not appear, when executed the application will quickly output an ASan error that looks like:
```bash
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
* Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
There is an environment variable, `ASAN_OPTIONS`, that can be used to adjust the runtime behavior of the ASAN runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASAN runtime. The device runtime only currently supports the default settings for the few relevant options.
There are two `ASAN_OPTION` flags of particular note.
*`halt_on_error=0/1 default 1`.
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
*`detect_leaks=0/1 default 1`.
This option directs the ASan runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
## Runtime overhead
Running an ASan instrumented application incurs
overheads which may result in unacceptably long runtimes
or failure to run at all.
### Higher execution time
ASan detection works by checking each address at runtime
before the address is actually accessed by a load, store, or atomic
instruction.
This checking involves an additional load to "shadow" memory which
records whether the address is "poisoned" or not, and additional logic
that decides whether to produce an detection report or not.
This extra runtime work can cause the application to slow down by
a factor of three or more, depending on how many memory accesses are
executed.
For heterogeneous applications, the shadow memory must be accessible by all devices
and this can mean that shadow accesses from some devices may be more costly
than non-shadow accesses.
### Higher memory use
The address checking described above relies on the compiler to surround
each program variable with a red zone and on ASan
runtime to surround each runtime memory allocation with a red zone and
fill the shadow corresponding to each red zone with poison.
The added memory for the red zones is additional overhead on top
of the 13% overhead for the shadow memory itself.
Applications which consume most one or more available memory pools when
run normally are likely to encounter allocation failures when run with
instrumentation.
## Runtime reporting
It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the ASan runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.
An invalid address detection report for the CPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on address <memory address> at pc <pc> bp <bp> sp <sp> <access> of size <N> at <memory address> thread T0
```
and continues with a stack trace for the access, a stack trace for the allocation and deallocation, if relevant, and a dump of the shadow near the <memory address>.
In contrast, an invalid address detection report for the GPU always starts with
```bash
==<PID>==ERROR: AddressSanitizer: <problem type> on amdgpu device <device> at pc <pc> <access> of size <n> in workgroup id (<X>,<Y>,<Z>)
```
Above, `<device>` is the integer device ID, and `(<X>, <Y>, <Z>)` is the ID of the workgroup or block where the invalid address was detected.
While the CPU report include a call stack for the thread attempting the invalid access, the GPU is currently to a call stack of size one, i.e. the (symbolized) of the invalid access, e.g.
```bash
#0 <pc> in <fuction signature> at /path/to/file.hip:<line>:<column>
```
This short call stack is followed by a GPU unique section that looks like
```bash
Thread ids and accessed addresses:
<lid0> <maddr 0> : <lid1> <maddr1> : ...
```
where each `<lid j> <maddr j>` indicates the lane ID and the invalid memory address held by lane `j` of the wavefront attempting the invalid access.
Additionally, reports for invalid GPU accesses to memory allocated by GPU code via `malloc` or new starting with, for example,
```bash
==1234==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fa9f5c92dcc
```
or
```bash
==5678==ERROR: AddressSanitizer: heap-use-after-free on amdgpu device 3 at pc 0x7f4c10062d74
```
currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.
### Running with `rocgdb`
`rocgdb` can be used to further investigate ASan detected errors, with some preparation.
Currently, the ASan runtime complains when starting `rocgdb` without preparation.
```bash
$ rocgdb my_app
==1122==ASan` runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
```
This is solved by setting environment variable `LD_PRELOAD` to the path to the ASan runtime, whose path can be obtained using the command
It is also recommended to set the environment variable `HIP_ENABLE_DEFERRED_LOADING=0` before debugging HIP applications.
After starting `rocgdb` breakpoints can be set on the ASan runtime error reporting entry points of interest. For example, if an ASan error report includes
```bash
WRITE of size 4 in workgroup id (10,0,0)
```
the `rocgdb` command needed to stop the program before the report is printed is
```bash
(gdb)break __asan_report_store4
```
Similarly, the appropriate command for a report including
```bash
READ of size <N> in workgroup ID (1,2,3)
```
is
```bash
(gdb)break __asan_report_load<N>
```
It is possible to set breakpoints on all ASan report functions using these commands:
```bash
$ rocgdb <path to application>
(gdb) start <commmand line arguments>
(gdb) rbreak ^__asan_report
(gdb) c
```
### Using ASan with a short HIP application
Refer to the following example to use ASan with a short HIP application,
* Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
* Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
* It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
If you don't see this line, click `Show all checks` to get an itemized view.
## Command line
You can build our documentation via the command line using Python. We use Python 3.8; other
versions may not support the build.
You can build our documentation via the command line using Python.
See the `build.tools.python` setting in the [Read the Docs configuration file](https://github.com/ROCm/ROCm/blob/develop/.readthedocs.yaml) for the Python version used by Read the Docs to build documentation.
See the [Python requirements file](https://github.com/ROCm/ROCm/blob/develop/docs/sphinx/requirements.txt) for Python packages needed to build the documentation.
Use the Python Virtual Environment (`venv`) and run the following commands from the project root:
The ROCm documentation, like all of ROCm, is open source and available on GitHub. You can contribute to the ROCm documentation by forking the appropriate repository, making your changes, and opening a pull request.
To provide feedback on the ROCm documentation, including submitting an issue or suggesting a feature, see [Providing feedback about the ROCm documentation](./feedback.md).
## The ROCm repositories
The repositories for ROCm and all ROCm components are available on GitHub.
| ROCm installation for Linux | [https://github.com/ROCm/rocm-install-on-linux/tree/develop/docs](https://github.com/ROCm/rocm-install-on-linux/tree/develop/docs) |
| ROCm HIP SDK installation for Windows | [https://github.com/ROCm/rocm-install-on-windows/tree/develop/docs](https://github.com/ROCm/rocm-install-on-windows/tree/develop/docs) |
Individual components have their own repositories with their own documentation in their own `docs` folders.
The sub-folders within the `docs` folders across ROCm are typically structured as follows:
| Sub-folder name | Documentation type |
|-------|----------|
| `install` | Installation instructions, build instructions, and prerequisites |
| `conceptual` | Important concepts |
| `how-to` | How to implement specific use cases |
| `tutorials` | Tutorials |
| `reference` | API references and other reference resources |
## Editing and adding to the documentation
ROCm documentation follows the [Google developer documentation style guide](https://developers.google.com/style/highlights).
Most topics in the ROCm documentation are written in [reStructuredText (rst)](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html), with some topics written in Markdown. Only use reStructuredText when adding new topics. Only use Markdown if the topic you are editing is already in Markdown.
To edit or add to the documentation:
1. Fork the repository you want to add to or edit.
2. Clone your fork locally.
3. Create a new local branch cut from the `develop` branch of the repository.
4. Make your changes to the documentation.
5. Optionally, build the documentation locally before creating a pull request by running the following commands from within the `docs` folder:
```bash
pip3 install -r sphinx/requirements.txt # You only need to run this command once
The output files will be located in the `docs/_build` folder. Open `docs/_build/html/index.html` to view the documentation.
For more information on ROCm build tools, see [Documentation toolchain](toolchain.md).
6. Push your changes. A GitHub link will be returned in the output of the `git push` command. Open this link in a browser to create the pull request.
The documentation is built as part of the checks on pull request, along with spell checking and linting. Scroll to the bottom of your pull request to view all the checks.
Verify that the linting and spell checking have passed, and that the documentation was built successfully. New words or acronyms can be added to the [wordlist file](https://github.com/ROCm/rocm-docs-core/blob/develop/.wordlist.txt). The wordlist is subject to approval by the ROCm documentation team.
The Read The Docs build of your pull request can be accessed by clicking on the Details link next to the Read The Docs build check. Verify that your changes are in the build and look as expected.


Your pull request will be reviewed by a member of the ROCm documentation team.
See the [GitHub documentation](https://docs.github.com/en) for information on how to fork and clone a repository, and how to create and push a local branch.
```{important}
By creating a pull request (PR), you agree to allow your contribution to be licensed under the terms of the
LICENSE.txt file in the corresponding repository. Different repositories can use different licenses.
There are four standard ways to provide feedback for this repository.
Feedback about the ROCm documentation is welcome. You can provide feedback about the ROCm documentation either through GitHub Discussions or GitHub Issues.
## Pull request
## Participating in discussions through GitHub Discussions
All contributions to ROCm documentation should arrive via the
targeting the develop branch of the repository. If you are unable to contribute
via the GitHub Flow, feel free to email us at [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
You can ask questions, view announcements, suggest new features, and communicate with other members of the community through [GitHub Discussions](https://github.com/ROCm/ROCm/discussions).
## GitHub discussions
## Submitting issues through GitHub Issues
To ask questions or view answers to frequently asked questions, refer to
1. Always do a search to see if the same issue already exists. If the issue already exists, upvote it, and comment or post to provide any additional details you might have.
2. If you find an issue that is similar to your issue, log your issue, then add a comment that includes a link to the similar issue, as well as its issue number.
3. Always provide as much information as possible. This helps reduce the time required to reproduce the issue.
## Email
After creating your issue, make sure to check it regularly for any requests for additional information.
Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
For information about contributing content to the ROCm documentation, see [Contributing to the ROCm documentation](./contributing.md).
Our documentation relies on several open source toolchains and sites.
The ROCm documentation relies on several open source toolchains and sites.
## `rocm-docs-core`
## rocm-docs-core
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD-maintained
project that applies customization for our documentation. This
project is the tool most ROCm repositories use as part of the documentation
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
[rocm-docs-core](https://github.com/ROCm/rocm-docs-core) is an AMD-maintained
project that applies customizations for the ROCm documentation. This project is the tool most ROCm repositories use as part of their documentation build pipeline. It is available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
See the user and developer guides for rocm-docs-core at
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
originally used for Python. It is now widely used in the open source community.
Originally, Sphinx supported reStructuredText (RST) based documentation, but
Markdown support is now available.
ROCm documentation plans to default to Markdown for new projects.
Existing projects using RST are under no obligation to convert to Markdown. New
projects that believe Markdown is not suitable should contact the documentation
team prior to selecting RST.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds
and hosts the HTML documentation generated using Sphinx to our end users.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts
information from inline code.
ROCm projects typically use Doxygen for public API documentation unless the
upstream project uses a different tool.
### Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen
content.
### MyST
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
It is integrated into ROCm documentation by the Sphinx extension [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
A cheat sheet that showcases how to use the MyST syntax is available over at
the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator originally used for Python. It is now widely used in the open source community.
is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file that specifies the table of contents.
It was selected due to its flexibility that allows scripts to operate on the
YAML file. Please transition to this file for the project's navigation. You can
see the `_toc.yml.in` file in this repository in the `docs/sphinx` folder for an
example.
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html) is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
based on a YAML file (`_toc.yml.in`) that contains the table of contents.
### Sphinx-book-theme
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme
that defines the base appearance for ROCm documentation.
ROCm documentation applies some customization,
such as a custom header and footer on top of the Sphinx Book Theme.
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme that defines the base appearance for ROCm documentation. ROCm documentation applies some customization, such as a custom header and footer, on top of the Sphinx Book Theme.
### Sphinx design
### Sphinx Design
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design
functionality.
ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design functionality. ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts information from in-code comments. It is used for API documentation.
## Breathe
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin for integrating Doxygen content.
## Read the Docs
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds and hosts the HTML version of the ROCm documentation.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.