mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-08 14:23:55 -05:00
Fix PyTorch Compatibility link and remove incomplete rows (#4195)
* fix pytorch-compatibility filename fix links * remove incomplete rows in pytorch-compatibility * fix broken refs
This commit is contained in:
@@ -22,7 +22,7 @@ ROCm Version,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.
|
||||
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
|
||||
,,,,,,,,,,,
|
||||
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,
|
||||
:doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
|
||||
:doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
|
||||
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
|
||||
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
|
||||
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
|
||||
|
||||
|
@@ -47,7 +47,7 @@ compatibility and system requirements.
|
||||
,gfx908,gfx908,gfx908
|
||||
,,,
|
||||
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
|
||||
:doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
|
||||
:doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
|
||||
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1"
|
||||
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26
|
||||
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3
|
||||
|
||||
@@ -576,14 +576,6 @@ PyTorch interacts with the CUDA or ROCm environment.
|
||||
- Globally enables or disables the PyTorch C++ implementation within SDPA.
|
||||
- 2.1
|
||||
- ❌
|
||||
* - ``allow_fp16_bf16_reduction_math_sdp``
|
||||
- Globally enables FP16 and BF16 precision for reduction operations within
|
||||
SDPA.
|
||||
- 2.1
|
||||
-
|
||||
..
|
||||
FIXME:
|
||||
- Partial?
|
||||
|
||||
.. Need to validate and extend.
|
||||
|
||||
@@ -671,15 +663,6 @@ of computational resources and scalability for large-scale tasks.
|
||||
those on separate machines.
|
||||
- 1.8
|
||||
- 5.4
|
||||
* - RPC Device Map Passing
|
||||
- RPC Device Map Passing in PyTorch refers to a feature of the Remote
|
||||
Procedure Call (RPC) framework that enables developers to control and
|
||||
specify how tensors are transferred between devices during remote
|
||||
operations. It allows fine-grained management of device placement when
|
||||
sending tensors across nodes in distributed training or execution
|
||||
scenarios.
|
||||
- 1.9
|
||||
-
|
||||
* - Gloo
|
||||
- Gloo is designed for multi-machine and multi-GPU setups, enabling
|
||||
efficient communication and synchronization between processes. Gloo is
|
||||
@@ -687,24 +670,6 @@ of computational resources and scalability for large-scale tasks.
|
||||
(DDP) and RPC frameworks, alongside other backends like NCCL and MPI.
|
||||
- 1.0
|
||||
- 2.0
|
||||
* - MPI
|
||||
- MPI (Message Passing Interface) in PyTorch refers to the use of the MPI
|
||||
backend for distributed communication in the ``torch.distributed`` module.
|
||||
It enables inter-process communication, primarily in distributed
|
||||
training settings, using the widely adopted MPI standard.
|
||||
- 1.9
|
||||
-
|
||||
* - TorchElastic
|
||||
- TorchElastic is a PyTorch library that enables fault-tolerant and
|
||||
elastic training in distributed environments. It is designed to handle
|
||||
dynamically changing resources, such as adding or removing nodes during
|
||||
training, which is especially useful in cloud-based or preemptible
|
||||
environments.
|
||||
- 1.9
|
||||
-
|
||||
|
||||
..
|
||||
FIXME: RPC Device Map Passing "Since ROCm version"
|
||||
|
||||
torch.compiler
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@@ -11,11 +11,14 @@ ROCm provides a comprehensive ecosystem for deep learning development, including
|
||||
deep learning frameworks and libraries such as PyTorch, TensorFlow, and JAX. ROCm works closely with these
|
||||
frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
|
||||
|
||||
The following guides provide information on compatibility and supported features for ROCm-enabled deep learning frameworks.
|
||||
The following guides provide information on compatibility and supported
|
||||
features for these ROCm-enabled deep learning frameworks.
|
||||
|
||||
* :doc:`PyTorch compatibility <../compatibility/pytorch-compatibility>`
|
||||
.. * :doc:`TensorFlow compatibility <../compatibility/tensorflow-compatibility>`
|
||||
.. * :doc:`JAX compatibility <../compatibility/jax-compatibility>`
|
||||
|
||||
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
|
||||
This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
|
||||
|
||||
.. image:: ../data/how-to/framework_install_2024_07_04.png
|
||||
:alt: Flowchart for installing ROCm-aware machine learning frameworks
|
||||
@@ -37,3 +40,4 @@ through the following guides.
|
||||
* :doc:`rocm-for-ai/index`
|
||||
|
||||
* :doc:`llm-fine-tuning-optimization/index`
|
||||
|
||||
|
||||
@@ -399,9 +399,6 @@ Further reading
|
||||
- To learn how to optimize inference on LLMs, see
|
||||
:doc:`Fine-tuning LLMs and inference optimization </how-to/llm-fine-tuning-optimization/index>`.
|
||||
|
||||
- For a list of other ready-made Docker images for ROCm, see the
|
||||
:doc:`Docker image support matrix <rocm-install-on-linux:reference/docker-image-support-matrix>`.
|
||||
|
||||
- To compare with the previous version of the ROCm vLLM Docker image for performance validation, refer to
|
||||
`LLM inference performance validation on AMD Instinct MI300X (ROCm 6.2.0) <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_.
|
||||
|
||||
|
||||
@@ -92,7 +92,7 @@ involves configuring tensor parallelism, leveraging advanced features, and
|
||||
ensuring efficient execution. Here’s how to optimize vLLM performance:
|
||||
|
||||
* Tensor parallelism: Configure the
|
||||
:ref:`tensor-parallel-size parameter <mi300x-vllm-optimize-tp-gemm>` to distribute
|
||||
:ref:`tensor-parallel-size parameter <mi300x-vllm-multiple-gpus>` to distribute
|
||||
tensor computations across multiple GPUs. Adjust parameters such as
|
||||
``batch-size``, ``input-len``, and ``output-len`` based on your workload.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user