Merge branch 'develop' into update-pytorch-compatibility

This commit is contained in:
Jithun Nair
2025-11-19 19:11:22 -06:00
committed by GitHub
11 changed files with 396 additions and 59 deletions

1
.gitignore vendored
View File

@@ -1,6 +1,7 @@
.venv
.vscode
build
__pycache__
# documentation artifacts
_build/

View File

@@ -21,17 +21,17 @@ for a complete overview of this release.
- Provides support for `xcp_metrics` v1.0 and extends support for v1.1 (dynamic metrics).
- Added `amdsmi_get_gpu_partition_metrics_info`, which provides per XCP (partition) metrics.
- Support for displaying newer VRAM memory types in `amd-smi static --vram`**.
- Support for displaying newer VRAM memory types in `amd-smi static --vram`.
- The `amdsmi_get_gpu_vram_info()` API now supports detecting DDR5, LPDDR4, LPDDR5, and HBM3E memory types.
#### Changed
- Updated `amd-smi static --numa` socket affinity data structure. It now displays CPU affinity information in both hexadecimal bitmask format and expanded CPU core ranges, replacing the previous simplified socket enumeration approach.
#### Resolved Issues
#### Resolved issues
- Fixed incorrect topology weight calculations.
- Out of bound writes caused corruption in the weights field
- Out-of-bound writes caused corruption in the weights field.
- Fixed `amd-smi event` not respecting the Linux timeout command.
@@ -56,7 +56,7 @@ for a complete overview of this release.
#### Resolved issues
* Incorrect Compute Unit (CU) mask in logging. HIP runtime now correctly sets the field width for the output print operation. When logging is enabled via the environment variable `AMD_LOG_LEVEL`, the runtime logs the accurate CU mask.
* A segmentation fault occurred when the dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't NULL during marker submission, preventing crashes and improving robustness.
* A segmentation fault occurred when the dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't `NULL` during marker submission, preventing crashes and improving robustness.
* An error encountered on HIP tear-down after device reset in certain applications due to accessing stale memory objects. HIP runtime now properly releases memory associated with host calls, ensuring reliable device resets.
* A race condition occurred in certain graph-related applications when pending asynchronous signal handlers referenced device memory that had already been released, leading to memory corruption. HIP runtime now uses a reference counting strategy to manage access to device objects in asynchronous event handlers, ensuring safe and reliable memory usage.
@@ -64,11 +64,11 @@ for a complete overview of this release.
#### Resolved issues
* Fixed an error that resulted when running `make check` on systems running on a gfx1201 GPU [(#4397)](https://github.com/ROCm/AMDMIGraphX/pull/4397).
* Fixed an error that resulted when running `make check` on systems running on a gfx1201 GPU.
### **RCCL** (2.27.7)
#### Resolved Issues
#### Resolved issues
* Fixed a single-node data corruption issue in MSCCL on the AMD Instinct MI350X and MI355X GPUs for the LL protocol. This previously affected about two percent of the runs for single-node `AllReduce` with inputs smaller than 512 KiB.
@@ -79,13 +79,13 @@ for a complete overview of this release.
### **ROCm Bandwidth Test** (2.6.0)
#### Fixed
#### Resolved issues
- Test failure with error message `Cannot make canonical path`.
- Healthcheck test failure with seg fault on gfx942.
- Segmentation fault observed in `schmoo` and `one2all` when executed on `sgpu` setup.
#### Known Issues
#### Known issues
- `rocm-bandwidth-test` folder fails to be removed after driver uninstallation:
* After running `amdgpu-uninstall`, the `rocm-bandwidth-test` folder and package are still present.
@@ -130,12 +130,13 @@ for a complete overview of this release.
#### Added
* Support for different test levels with `-r` option for MI3XXx.
* Set compute type for DGEMM operations in MI350X and MI355X.
* Support for different test levels with `-r` option for AMD Instinct MI3XXx GPUs.
* Set compute type for DGEMM operations on AMD Instinct MI350X and MI355X GPUs.
### **rocSHMEM** (3.0.0)
#### Added
* Allowed IPC, RO, and GDA backends to be selected at runtime.
* GDA conduit for different NIC vendors:
* Broadcom BNXT\_RE (Thor 2)
@@ -177,7 +178,7 @@ for a complete overview of this release.
* Absolute paths from the `RPATH` of sample and test binary files.
#### Resolved Issues
#### Resolved issues
* Fixed issues caused by HIP changes:
* Removed the `.data` member from `HIP_vector_type`.

View File

@@ -208,7 +208,11 @@ matrix](../../docs/compatibility/compatibility-matrix.rst) for the complete list
#### JAX
User of the JAX deep learning framework can now efficiently use Llama-2. For more information, see [JAX compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/jax-compatibility.html).
JAX deep learning framework users can now efficiently use Llama-2. For more information, see [JAX compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/jax-compatibility.html).
#### PyTorch
ROCm 7.1.1 enables support for PyTorch 2.9. For more information, see [PyTorch compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/pytorch-compatibility.html).
#### Deep Graph Library (DGL)
@@ -237,6 +241,35 @@ The ROCm Runfile Installer 7.1.1 includes the following features and improvement
For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/install/rocm-runfile-installer.html).
### Expansion of the ROCm examples repository
The [ROCm examples repository](https://github.com/ROCm/rocm-examples) has been expanded with examples for the following ROCm components:
::::{grid} 2
:margin: auto 0 auto auto
:::{grid}
:margin: auto 0 auto auto
* [hipBLASLt](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/)
* [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/)
* [hipSPARSELt](https://rocm.docs.amd.com/projects/hipSPARSELt/en/latest/)
* [hipTensor](https://rocm.docs.amd.com/projects/hipTensor/en/latest/)
:::
:::{grid}
:margin: auto 0 auto auto
* [rocALUTION](https://rocm.docs.amd.com/projects/rocALUTION/en/latest/)
* [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/)
* [rocWMMA](https://rocm.docs.amd.com/projects/rocWMMA/en/latest/)
:::
::::
Usage examples are now available for the following performance analysis tools:
* [ROCm Compute Profiler](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/index.html)
* [ROCm Systems Profiler](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html)
* [rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofv3.html)
The complete source code for the [HIP Graph Tutorial](https://rocm.docs.amd.com/projects/HIP/en/latest/tutorial/graph_api.html) is also available as part of the ROCm examples.
### ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
@@ -255,32 +288,7 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
For more information about the changes, see the [Changelog for the AI Developer Hub](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/changelog.html).
* The [ROCm examples repository](https://github.com/ROCm/rocm-examples) has been expanded with examples for the following ROCm components:
::::{grid} 2
:margin: auto 0 auto auto
:::{grid}
:margin: auto 0 auto auto
* [hipBLASLt](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/)
* [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/)
* [hipSPARSELt](https://rocm.docs.amd.com/projects/hipSPARSELt/en/latest/)
* [hipTensor](https://rocm.docs.amd.com/projects/hipTensor/en/latest/)
:::
:::{grid}
:margin: auto 0 auto auto
* [rocALUTION](https://rocm.docs.amd.com/projects/rocALUTION/en/latest/)
* [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/)
* [rocWMMA](https://rocm.docs.amd.com/projects/rocWMMA/en/latest/)
:::
::::
Usage examples are now available for the following performance analysis tools:
* [ROCm Compute Profiler](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/index.html)
* [ROCm Systems Profiler](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html)
* [rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofv3.html)
The complete source code for the [HIP Graph Tutorial](https://rocm.docs.amd.com/projects/HIP/en/latest/tutorial/graph_api.html) is also available as part of the ROCm examples.
* ROCm environment variables are used to configure and optimize the development and runtime experience. These variables define key settings such as installation paths, platform selection, and runtime behavior for applications running on AMD GPUs. The new [ROCm environment variables](https://advanced-micro-devices-rocm-internal--395.com.readthedocs.build/en/395/reference/env-variables.html#environment-variables-in-rocm-libraries) topic summarizes HIP and ROCR-Runtime environment variables, and provides links to environment variable topics for other ROCm components.
## ROCm components
@@ -632,17 +640,17 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
- Provides support for `xcp_metrics` v1.0 and extends support for v1.1 (dynamic metrics).
- Added `amdsmi_get_gpu_partition_metrics_info`, which provides per XCP (partition) metrics.
- Support for displaying newer VRAM memory types in `amd-smi static --vram`**.
- Support for displaying newer VRAM memory types in `amd-smi static --vram`.
- The `amdsmi_get_gpu_vram_info()` API now supports detecting DDR5, LPDDR4, LPDDR5, and HBM3E memory types.
#### Changed
- Updated `amd-smi static --numa` socket affinity data structure. It now displays CPU affinity information in both hexadecimal bitmask format and expanded CPU core ranges, replacing the previous simplified socket enumeration approach.
#### Resolved Issues
#### Resolved issues
- Fixed incorrect topology weight calculations.
- Out of bound writes caused corruption in the weights field
- Out-of-bound writes caused corruption in the weights field.
- Fixed `amd-smi event` not respecting the Linux timeout command.
@@ -667,7 +675,7 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
#### Resolved issues
* Incorrect Compute Unit (CU) mask in logging. HIP runtime now correctly sets the field width for the output print operation. When logging is enabled via the environment variable `AMD_LOG_LEVEL`, the runtime logs the accurate CU mask.
* A segmentation fault occurred when the dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't NULL during marker submission, preventing crashes and improving robustness.
* A segmentation fault occurred when the dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't `NULL` during marker submission, preventing crashes and improving robustness.
* An error encountered on HIP tear-down after device reset in certain applications due to accessing stale memory objects. HIP runtime now properly releases memory associated with host calls, ensuring reliable device resets.
* A race condition occurred in certain graph-related applications when pending asynchronous signal handlers referenced device memory that had already been released, leading to memory corruption. HIP runtime now uses a reference counting strategy to manage access to device objects in asynchronous event handlers, ensuring safe and reliable memory usage.
@@ -675,11 +683,11 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
#### Resolved issues
* Fixed an error that resulted when running `make check` on systems running on a gfx1201 GPU [(#4397)](https://github.com/ROCm/AMDMIGraphX/pull/4397).
* Fixed an error that resulted when running `make check` on systems running on a gfx1201 GPU.
### **RCCL** (2.27.7)
#### Resolved Issues
#### Resolved issues
* Fixed a single-node data corruption issue in MSCCL on the AMD Instinct MI350X and MI355X GPUs for the LL protocol. This previously affected about two percent of the runs for single-node `AllReduce` with inputs smaller than 512 KiB.
@@ -690,13 +698,13 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
### **ROCm Bandwidth Test** (2.6.0)
#### Fixed
#### Resolved issues
- Test failure with error message `Cannot make canonical path`.
- Healthcheck test failure with seg fault on gfx942.
- Segmentation fault observed in `schmoo` and `one2all` when executed on `sgpu` setup.
#### Known Issues
#### Known issues
- `rocm-bandwidth-test` folder fails to be removed after driver uninstallation:
* After running `amdgpu-uninstall`, the `rocm-bandwidth-test` folder and package are still present.
@@ -741,12 +749,13 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
#### Added
* Support for different test levels with `-r` option for MI3XXx.
* Set compute type for DGEMM operations in MI350X and MI355X.
* Support for different test levels with `-r` option for AMD Instinct MI3XXx GPUs.
* Set compute type for DGEMM operations on AMD Instinct MI350X and MI355X GPUs.
### **rocSHMEM** (3.0.0)
#### Added
* Allowed IPC, RO, and GDA backends to be selected at runtime.
* GDA conduit for different NIC vendors:
* Broadcom BNXT\_RE (Thor 2)
@@ -788,7 +797,7 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
* Absolute paths from the `RPATH` of sample and test binary files.
#### Resolved Issues
#### Resolved issues
* Fixed issues caused by HIP changes:
* Removed the `.data` member from `HIP_vector_type`.
@@ -839,8 +848,8 @@ SMI, see the [AMD SMI documentation](https://rocm.docs.amd.com/projects/amdsmi/e
### ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
Development and support for ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (`rocprofv3`) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.
ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` are deprecated and only critical defect fixes will be addressed for older versions of the profiling tools and libraries. It's strongly recommended to upgrade to the latest version of the [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/) library and the (`rocprofv3`) tool to ensure continued support and access to new features.
It's anticipated that ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` will reach end-of-life by future releases, aligning with Q1 of 2026.
### AMDGPU wavefront size compiler macro deprecation

View File

@@ -30,9 +30,9 @@ ROCm Version,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6
,gfx908 [#mi100-710-os-past-60]_,gfx908 [#mi100-710-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,,,,,,,,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.7, 2.6, 2.5","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9, 2.8, 2.7","2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.7, 2.6, 2.5","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.6.0,0.6.0,0.6.0,0.6.0,0.4.35,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.7.1,0.6.0,0.6.0,0.6.0,0.4.35,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
:doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.3.0.post0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` [#stanford-megatron-lm_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,85f95ae,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_,N/A,N/A,N/A,2.4.0,2.4.0,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
1 ROCm Version 7.1.1 7.1.0 7.0.2 7.0.1/7.0.0 6.4.3 6.4.2 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
30 gfx908 [#mi100-710-os-past-60]_ gfx908 [#mi100-710-os-past-60]_ gfx908 [#mi100-os-past-60]_ gfx908 [#mi100-os-past-60]_ gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908
31
32 FRAMEWORK SUPPORT .. _framework-support-compatibility-matrix-past-60:
33 :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>` 2.8, 2.7, 2.6 2.9, 2.8, 2.7 2.8, 2.7, 2.6 2.8, 2.7, 2.6 2.7, 2.6, 2.5 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13
34 :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>` 2.20.0, 2.19.1, 2.18.1 2.20.0, 2.19.1, 2.18.1 2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_ 2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_ 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.14.0, 2.13.1, 2.12.1 2.14.0, 2.13.1, 2.12.1
35 :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>` 0.6.0 0.7.1 0.6.0 0.6.0 0.6.0 0.4.35 0.4.35 0.4.35 0.4.35 0.4.31 0.4.31 0.4.31 0.4.31 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26
36 :doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat-past-60]_ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.3.0.post0 N/A N/A N/A N/A N/A N/A
37 :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` [#stanford-megatron-lm_compat-past-60]_ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 85f95ae N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
38 :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_ N/A N/A N/A 2.4.0 2.4.0 N/A N/A 2.4.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

View File

@@ -54,9 +54,9 @@ compatibility and system requirements.
,gfx908 [#mi100-710-os]_,gfx908 [#mi100-710-os]_,gfx908
,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.6, 2.5, 2.4, 2.3"
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9, 2.8, 2.7","2.8, 2.7, 2.6","2.6, 2.5, 2.4, 2.3"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.18.1, 2.17.1, 2.16.2"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.6.0,0.6.0,0.4.35
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.7.1,0.6.0,0.4.35
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat]_,N/A,N/A,2.4.0
:doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat]_,N/A,N/A,b5997
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.22.0,1.22.0,1.20.0
@@ -69,8 +69,8 @@ compatibility and system requirements.
Thrust,2.8.5,2.8.5,2.5.0
CUB,2.8.5,2.8.5,2.5.0
,,,
DRIVER & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, 30.10, 6.4.x","30.20.0 [#mi325x_KVM]_, 30.10.2, 30.10.1 [#driver_patch]_, 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
DRIVER & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","30.20.0 [#mi325x_KVM]_, 30.10.2, |br| 30.10.1 [#driver_patch]_, 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0

View File

@@ -34,7 +34,7 @@ Runtime
```{code-block} shell
:caption: Example to expose the 1. device and a device based on UUID.
export ROCR_VISIBLE_DEVICES="0,GPU-DEADBEEFDEADBEEF"
export ROCR_VISIBLE_DEVICES="0,GPU-4b2c1a9f-8d3e-6f7a-b5c9-2e4d8a1f6c3b"
```
### `GPU_DEVICE_ORDINAL`

View File

@@ -8,6 +8,7 @@ import os
import shutil
import sys
from pathlib import Path
from subprocess import run
gh_release_path = os.path.join("..", "RELEASE.md")
gh_changelog_path = os.path.join("..", "CHANGELOG.md")
@@ -84,6 +85,9 @@ html_context = {"docs_header_version": "7.1.1"}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
# Check if the branch is a docs/ branch
official_branch = run(["git", "rev-parse", "--abbrev-ref", "HEAD"], capture_output=True, text=True).stdout.find("docs/")
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
project_path = os.path.abspath(".").replace("\\", "/")
@@ -202,7 +206,7 @@ external_toc_path = "./sphinx/_toc.yml"
# Add the _extensions directory to Python's search path
sys.path.append(str(Path(__file__).parent / 'extension'))
extensions = ["rocm_docs", "sphinx_reredirects", "sphinx_sitemap", "sphinxcontrib.datatemplates", "version-ref", "csv-to-list-table"]
extensions = ["rocm_docs", "sphinx_reredirects", "sphinx_sitemap", "sphinxcontrib.datatemplates", "remote-content", "version-ref", "csv-to-list-table"]
compatibility_matrix_file = str(Path(__file__).parent / 'compatibility/compatibility-matrix-historical-6.0.csv')
@@ -216,6 +220,10 @@ html_context = {"docs_header_version": "7.1.0"}
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
html_context["official_branch"] = official_branch
html_context["version"] = version
html_context["release"] = release
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}

View File

@@ -0,0 +1,141 @@
from docutils import nodes
from docutils.parsers.rst import Directive
from docutils.statemachine import ViewList
from sphinx.util import logging
from sphinx.util.nodes import nested_parse_with_titles
import requests
import re
logger = logging.getLogger(__name__)
class BranchAwareRemoteContent(Directive):
"""
Directive that downloads and includes content from other repositories,
matching the branch/tag of the current documentation build.
Usage:
.. remote-content::
:repo: owner/repository
:path: path/to/file.rst
:default_branch: docs/develop # Branch to use when not on a release
:tag_prefix: Docs/ # Optional
"""
required_arguments = 0
optional_arguments = 0
final_argument_whitespace = True
has_content = False
option_spec = {
'repo': str,
'path': str,
'default_branch': str, # Branch to use when not on a release tag
'start_line': int, # Include the file from a specific line
'tag_prefix': str, # Prefix for release tags (e.g., 'Docs/')
}
def get_current_version(self):
"""Get current version/branch being built"""
env = self.state.document.settings.env
html_context = env.config.html_context
# Check if building from a tag
if "official_branch" in html_context:
if html_context["official_branch"] == 0:
if "version" in html_context:
# Remove any 'v' prefix
version = html_context["version"]
if re.match(r'^\d+\.\d+\.\d+$', version):
return version
# Not a version tag, so we'll use the default branch
return None
def get_target_ref(self):
"""Get target reference for the remote repository"""
current_version = self.get_current_version()
# If it's a version number, use tag prefix and version
if current_version:
tag_prefix = self.options.get('tag_prefix', '')
return f'{tag_prefix}{current_version}'
# For any other case, use the specified default branch
if 'default_branch' not in self.options:
logger.warning('No default_branch specified and not building from a version tag')
return None
return self.options['default_branch']
def construct_raw_url(self, repo, path, ref):
"""Construct the raw.githubusercontent.com URL"""
return f'https://raw.githubusercontent.com/{repo}/{ref}/{path}'
def fetch_and_parse_content(self, url, source_path):
"""Fetch content and parse it as RST"""
response = requests.get(url)
response.raise_for_status()
content = response.text
start_line = self.options.get('start_line', 0)
# Create ViewList for parsing
line_count = 0
content_list = ViewList()
for line_no, line in enumerate(content.splitlines()):
if line_count >= start_line:
content_list.append(line, source_path, line_no)
line_count+=1
# Create a section node and parse content
node = nodes.section()
nested_parse_with_titles(self.state, content_list, node)
return node.children
def run(self):
if 'repo' not in self.options or 'path' not in self.options:
logger.warning('Both repo and path options are required')
return []
target_ref = self.get_target_ref()
if not target_ref:
return []
raw_url = self.construct_raw_url(
self.options['repo'],
self.options['path'],
target_ref
)
try:
logger.info(f'Attempting to fetch content from {raw_url}')
return self.fetch_and_parse_content(raw_url, self.options['path'])
except requests.exceptions.RequestException as e:
logger.warning(f'Failed to fetch content from {raw_url}: {str(e)}')
# If we failed on a tag, try falling back to default_branch
if re.match(r'^\d+\.\d+\.\d+$', target_ref) or target_ref.startswith('Docs/'):
if 'default_branch' in self.options:
try:
fallback_ref = self.options['default_branch']
logger.info(f'Attempting fallback to {fallback_ref}...')
fallback_url = self.construct_raw_url(
self.options['repo'],
self.options['path'],
fallback_ref
)
return self.fetch_and_parse_content(fallback_url, self.options['path'])
except requests.exceptions.RequestException as e2:
logger.warning(f'Fallback also failed: {str(e2)}')
return []
def setup(app):
app.add_directive('remote-content', BranchAwareRemoteContent)
return {
'parallel_read_safe': True,
'parallel_write_safe': True,
}

View File

@@ -65,6 +65,8 @@ ROCm documentation is organized into the following categories:
* [ROCm libraries](./reference/api-libraries.md)
* [ROCm tools, compilers, and runtimes](./reference/rocm-tools.md)
* [GPU hardware specifications](./reference/gpu-arch-specs.rst)
* [Hardware atomics operation support](./reference/gpu-atomics-operation.rst)
* [Environment variables](./reference/env-variables.rst)
* [Data types and precision support](./reference/precision-support.rst)
* [Graph safe support](./reference/graph-safe-support.rst)
<!-- markdownlint-enable MD051 -->

View File

@@ -0,0 +1,173 @@
.. meta::
:description: Environment variables reference
:keywords: AMD, ROCm, environment variables, environment, reference, settings
.. role:: cpp(code)
:language: cpp
.. _env-variables-reference:
*************************************************************
ROCm environment variables
*************************************************************
ROCm provides a set of environment variables that allow users to configure and optimize their development
and runtime experience. These variables define key settings such as installation paths, platform selection,
and runtime behavior for applications running on AMD accelerators and GPUs.
This page outlines commonly used environment variables across different components of the ROCm software stack,
including HIP and ROCR-Runtime. Understanding these variables can help streamline software development and
execution in ROCm-based environments.
HIP environment variables
=========================
The following tables list the HIP environment variables.
GPU isolation variables
--------------------------------------------------------------------------------
.. remote-content::
:repo: ROCm/rocm-systems
:path: /projects/hip/docs/reference/env_variables/gpu_isolation_hip_env.rst
:default_branch: develop
:tag_prefix: docs/
Profiling variables
--------------------------------------------------------------------------------
.. remote-content::
:repo: ROCm/rocm-systems
:path: /projects/hip/docs/reference/env_variables/profiling_hip_env.rst
:default_branch: develop
:tag_prefix: docs/
Debug variables
--------------------------------------------------------------------------------
.. remote-content::
:repo: ROCm/rocm-systems
:path: /projects/hip/docs/reference/env_variables/debug_hip_env.rst
:default_branch: develop
:tag_prefix: docs/
Memory management related variables
--------------------------------------------------------------------------------
.. remote-content::
:repo: ROCm/rocm-systems
:path: /projects/hip/docs/reference/env_variables/memory_management_hip_env.rst
:default_branch: develop
:tag_prefix: docs/
Other useful variables
--------------------------------------------------------------------------------
.. remote-content::
:repo: ROCm/rocm-systems
:path: /projects/hip/docs/reference/env_variables/miscellaneous_hip_env.rst
:default_branch: develop
:tag_prefix: docs/
ROCR-Runtime environment variables
==================================
The following table lists the ROCR-Runtime environment variables:
.. remote-content::
:repo: ROCm/ROCR-Runtime
:path: runtime/docs/data/env_variables.rst
:default_branch: amd-staging
:tag_prefix: docs/
HIPCC environment variables
===========================
This topic provides descriptions of the HIPCC environment variables.
.. remote-content::
:repo: ROCm/llvm-project
:path: amd/hipcc/docs/env.rst
:default_branch: amd-staging
:start_line: 14
:tag_prefix: docs/
Environment variables in ROCm libraries
=======================================
Many ROCm libraries define environment variables for specific tuning, debugging,
or behavioral control. The table below provides an overview and links to further
documentation.
.. list-table::
:header-rows: 1
:widths: 30, 70
* - Library
- Purpose of Environment Variables
* - :doc:`hipBLASLt <hipblaslt:reference/env-variables>`
- Manage logging, debugging, offline tuning, and stream-K configuration
for hipBLASLt.
* - :doc:`hipSPARSELt <hipsparselt:reference/env-variables>`
- Control logging, debugging and performance monitoring of hipSPARSELt.
* - :doc:`rocBLAS <rocblas:reference/env-variables>`
- Performance tuning, kernel selection, logging, and debugging for BLAS
operations.
* - :doc:`rocSolver <rocsolver:reference/env_variables>`
- Control logging of rocSolver.
* - :doc:`rocSPARSE <rocsparse:reference/env_variables>`
- Control logging of rocSPARSE.
* - :doc:`MIGraphX <amdmigraphx:reference/MIGraphX-dev-env-vars>`
- Control debugging, testing, and model performance tuning options for
MIGraphX.
* - :doc:`MIOpen <miopen:reference/env_variables>`
- Control MIOpen logging and debugging, find mode and algorithm behavior
and others.
* - :doc:`MIVisionX <mivisionx:reference/MIVisionX-env-variables>`
- Control core OpenVX, GPU/device and debugging/profiling, stitching and
chroma key configurations, file I/O operations, model deployment, and
neural network parameters of MIVisionX.
* - :doc:`RCCL <rccl:api-reference/env-variables>`
- Control the logging, debugging, compiler and assembly behavior, and
cache of RPP.
* - :doc:`RPP <rpp:reference/rpp-env-variables>`
- Logging, debugging, compiler and assembly management, and cache control in RPP
* - `Tensile <https://rocm.docs.amd.com/projects/Tensile/en/latest/src/reference/environment-variables.html>`_
- Enable testing, debugging, and experimental features for Tensile clients and applications
Key single-variable details
===========================
This section provides detailed descriptions, in the standard format, for ROCm
libraries that feature a single, key environment variable (or a very minimal set)
which is documented directly on this page for convenience.
.. _rocalution-vars-detail:
rocALUTION
----------
.. list-table::
:header-rows: 1
:widths: 70,30
* - Environment variable
- Value
* - | ``ROCALUTION_LAYER``
| If set to ``1``, enable file logging. Logs each rocALUTION function call including object constructor/destructor, address of the object, memory allocation, data transfers, all function calls for matrices, vectors, solvers, and preconditioners. The log file is placed in the working directory.
- | ``1`` (Enable trace file logging)
| Default: Not set.

View File

@@ -216,6 +216,8 @@ subtrees:
title: ROCm tools, compilers, and runtimes
- file: reference/gpu-arch-specs.rst
- file: reference/gpu-atomics-operation.rst
- file: reference/env-variables.rst
title: Environment variables
- file: reference/precision-support.rst
title: Data types and precision support
- file: reference/graph-safe-support.rst