Files
ROCm/docs/compatibility/ml-compatibility/ray-compatibility.rst

115 lines
5.5 KiB
ReStructuredText
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
:orphan:
.. meta::
:description: Ray compatibility
:keywords: GPU, Ray, deep learning, framework compatibility
.. version-set:: rocm_version latest
*******************************************************************************
Ray compatibility
*******************************************************************************
Ray is a unified framework for scaling AI and Python applications from your laptop
to a full cluster, without changing your code. Ray consists of `a core distributed
runtime <https://docs.ray.io/en/latest/ray-core/walkthrough.html>`__ and a set of
`AI libraries <https://docs.ray.io/en/latest/ray-air/getting-started.html>`__ for
simplifying machine learning computations.
Ray is a general-purpose framework that runs many types of workloads efficiently.
Any Python application can be scaled with Ray, without extra infrastructure.
Support overview
================================================================================
- The ROCm-supported version of Ray is maintained in the official `https://github.com/ROCm/ray
<https://github.com/ROCm/ray>`__ repository, which differs from the
`https://github.com/ray-project/ray <https://github.com/ray-project/ray>`__ upstream repository.
- To get started and install Ray on ROCm, use the prebuilt :ref:`Docker image <ray-docker-compat>`,
which includes ROCm, Ray, and all required dependencies.
- See the :doc:`ROCm Ray installation guide <rocm-install-on-linux:install/3rd-party/ray-install>`
for installation and setup instructions.
- You can also consult the upstream `Installation guide <https://docs.ray.io/en/latest/ray-overview/installation.html>`__
for additional context.
.. _ray-docker-compat:
Compatibility matrix
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `ROCm Ray Docker images <https://hub.docker.com/r/rocm/ray/tags>`__
with ROCm backends on Docker Hub. The following Docker image tags and
associated inventories represent the latest Ray version from the official Docker Hub.
Click |docker-icon| to view the image on Docker Hub.
.. list-table::
:header-rows: 1
:class: docker-image-compatibility
* - Docker image
- ROCm
- Ray
- Pytorch
- Ubuntu
- Python
- GPU
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/ray/ray-2.51.1_rocm7.0.0_ubuntu22.04_py3.12_pytorch2.9.0/images/sha256-a02f6766b4ba406f88fd7e85707ec86c04b569834d869a08043ec9bcbd672168"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
- `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
- `2.51.1 <https://github.com/ROCm/ray/tree/release/2.51.1>`__
- 2.9.0a0+git1c57644
- 22.04
- `3.12.12 <https://www.python.org/downloads/release/python-31212/>`__
- MI300X
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/ray/ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0/images/sha256-0d166fe6bdced38338c78eedfb96eff92655fb797da3478a62dd636365133cc0"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
- `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
- `2.48.0.post0 <https://github.com/ROCm/ray/tree/release/2.48.0.post0>`__
- 2.6.0+git684f6f2
- 24.04
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- MI300X, MI210
Use cases and recommendations
================================================================================
* The `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm
Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`__
blog provides an overview of Volcano Engine Reinforcement Learning (verl)
for large language models (LLMs) and discusses its benefits in large-scale
reinforcement learning from human feedback (RLHF). It uses Ray as part of a
hybrid orchestration engine to schedule and coordinate training and inference
tasks in parallel, enabling optimized resource utilization and potential overlap
between these phases. This dynamic resource allocation strategy significantly
improves overall system efficiency. The blog presents verls performance results,
focusing on throughput and convergence accuracy achieved on AMD Instinct™ MI300X
GPUs. Follow this guide to get started with verl on AMD Instinct GPUs and
accelerate your RLHF training with ROCm-optimized performance.
* The `Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows
<https://rocm.blogs.amd.com/artificial-intelligence/rocm-ray/README.html>`__
blog post describes key use cases such as training and inference for large language models (LLMs),
model serving, hyperparameter tuning, reinforcement learning, and the orchestration of large-scale
workloads using Ray in the ROCm environment.
For more use cases and recommendations, see the AMD GPU tabs in the `Accelerator Support
topic <https://docs.ray.io/en/latest/ray-core/scheduling/accelerators.html#accelerator-support>`__
of the Ray core documentation and refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__,
where you can search for Ray examples and best practices to optimize your workloads on AMD GPUs.
Previous versions
===============================================================================
See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/ray-history` to find documentation for previous releases
of the ``ROCm/ray`` Docker image.