Files
ROCm/docs/how-to/rocm-for-ai/inference/index.rst
anisha-amd a98236a4e3 Main Docs: references of accelerator removal and change to GPU (#5495)
* Docs: references of accelerator removal and change to GPU

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-10-16 11:22:10 -04:00

30 lines
1.9 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
.. meta::
:description: How to use ROCm for AI inference workloads.
:keywords: ROCm, AI, machine learning, LLM, AI inference, NLP, GPUs, usage, tutorial
****************************
Use ROCm for AI inference
****************************
AI inference is a process of deploying a trained machine learning model to make predictions or classifications on new data.This commonly involves using the model with real-time data and making quick decisions based on the predictions made by the model.
Understanding the ROCm™ software platforms architecture and capabilities is vital for running AI inference. By leveraging the ROCm platform's capabilities, you can harness the power of high-performance computing and efficient resource management to run inference workloads, leading to faster predictions and classifications on real-time data.
Throughout the following topics, this section provides a comprehensive guide to setting up and deploying AI inference on AMD GPUs. This includes instructions on how to install ROCm, how to use Hugging Face Transformers to manage pre-trained models for natural language processing (NLP) tasks, how to validate vLLM on AMD Instinct™ MI300X GPUs and illustrate how to deploy trained models in production environments.
The AI Developer Hub contains `AMD ROCm tutorials <https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/>`_ for
training, fine-tuning, and inference. It leverages popular machine learning frameworks on AMD GPUs.
- :doc:`Installing ROCm and machine learning frameworks <../install>`
- :doc:`Running models from Hugging Face <hugging-face-models>`
- :doc:`LLM inference frameworks <llm-inference-frameworks>`
- :doc:`vLLM inference performance testing <benchmark-docker/vllm>`
- :doc:`PyTorch inference performance testing <benchmark-docker/pytorch-inference>`
- :doc:`SGLang inference performance testing <benchmark-docker/sglang>`
- :doc:`Deploying your model <deploy-your-model>`