mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 14:48:06 -05:00
add MAD page
This commit is contained in:
159
docs/how-to/rocm-for-ai/model-automation-and-dashboarding.rst
Normal file
159
docs/how-to/rocm-for-ai/model-automation-and-dashboarding.rst
Normal file
@@ -0,0 +1,159 @@
|
||||
.. meta::
|
||||
:description: Discover and run deep learning models with AMD MAD -- Model Automation and Dashboarding tool.
|
||||
:keywords: AI, LLM, machine, dashboarding, zoo,
|
||||
|
||||
************************
|
||||
Running models using MAD
|
||||
************************
|
||||
|
||||
The AMD Model Automation and Dashboarding (MAD) tool brings together an AI model zoo with automated execution across
|
||||
various GPU architectures. It facilitates performance tracking by including mechanisms for maintaining historical
|
||||
performance data and generating dashboards for analysis. MAD's source code repository and full documentation are located
|
||||
at `<https://github.com/ROCm/MAD>`__.
|
||||
|
||||
MAD pulls various models from their repositories and tests their performance inside ROCm Docker images. It is an index
|
||||
of deep learning models that have been trained to get the best reproducible accuracy and performance with AMD’s ROCm
|
||||
software stack running on AMD GPUs.
|
||||
|
||||
Use MAD to:
|
||||
|
||||
* Try new models,
|
||||
|
||||
* Compare performance between patches or architectures, and
|
||||
|
||||
* Track functionality and performance over time.
|
||||
|
||||
Getting started with MAD
|
||||
========================
|
||||
|
||||
Refer to the steps to set up your host computer with :doc:`ROCm <rocm:index>` here. Follow the detailed
|
||||
:doc:`installation instructions <rocm-install-on-linux:install/detailed-install>` for Linux-based platforms.
|
||||
|
||||
ROCm Docker images
|
||||
------------------
|
||||
|
||||
You can find ROCm Docker images for PyTorch and TensorFlow on Docker Hub at
|
||||
:fab:`docker` `rocm/pytorch <https://hub.docker.com/r/rocm/pytorch>`_ and
|
||||
:fab:`docker` `rocm/tensorflow <https://hub.docker.com/r/rocm/tensorflow>`_.
|
||||
|
||||
AMD publishes a unified Docker image at :fab:`docker` `rocm/vllm <https://hub.docker.com/r/rocm/vllm>`_ that packages
|
||||
together vLLM and PyTorch for the AMD Instinct™ MI300X accelerator. This enables users to quickly validate the expected
|
||||
inference performance numbers on the MI300X. This Docker image includes:
|
||||
|
||||
- ROCm
|
||||
|
||||
- vLLM
|
||||
|
||||
- PyTorch
|
||||
|
||||
- Tuning files (.csv format)
|
||||
|
||||
See `<https://github.com/ROCm/MAD/tree/develop/benchmark/vllm>`__ for more information.
|
||||
|
||||
.. _mad-run-locally:
|
||||
|
||||
Using MAD to run models locally
|
||||
===============================
|
||||
|
||||
The following describes MAD's basic functionalities.
|
||||
|
||||
1. Clone the `MAD repository <https://github.com/ROCm/MAD>`_ to a local directory and install the required packages
|
||||
on the host machine. For example:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
git clone https://github.com/ROCm/MAD
|
||||
cd MAD
|
||||
pip3 install -r requirements.txt
|
||||
|
||||
2. Using the ``tools/run_models.py`` script, you can run and collect performance results for all models in
|
||||
``models.json`` locally on a Docker host.
|
||||
|
||||
``run_models.py`` is the main MAD command line interface for running models locally. While the tool has many options,
|
||||
running any single model is very easy. To run a model, look for its name or tag in the ``models.json`` and pass it to
|
||||
``run_models.py`` in the form of:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
tools/run_models.py [-h] [--model_name MODEL_NAME] [--timeout TIMEOUT] [--live_output] [--clean_docker_cache] [--keep_alive] [--keep_model_dir] [-o OUTPUT] [--log_level LOG_LEVEL]
|
||||
|
||||
See :ref:`mad-run-args` for the list of options and their descriptions.
|
||||
|
||||
For each model in ``models.json``, the script:
|
||||
|
||||
* Builds Docker images associated with each model. The images are named
|
||||
``ci-$(model_name)``, and are not removed after the script completes.
|
||||
|
||||
* Starts the Docker container, with name, ``container_$(model_name)``.
|
||||
The container should automatically be stopped and removed whenever
|
||||
the script exits.
|
||||
|
||||
* Clones the git ``url``, and runs the ``scripts``.
|
||||
|
||||
* Compiles the final ``perf.csv`` and ``perf.html``.
|
||||
|
||||
.. _mad-run-args:
|
||||
|
||||
Arguments
|
||||
---------
|
||||
|
||||
--help, -h
|
||||
Show this help message and exit
|
||||
|
||||
--tags TAGS
|
||||
Tags to run (can be multiple). Overrides ``tags.json``. See :ref:`mad-run-tags`.
|
||||
|
||||
--model-name MODEL_NAME
|
||||
Model name to run the application.
|
||||
|
||||
--timeout TIMEOUT
|
||||
Timeout for the application running model in seconds, default timeout of 7200 (2 hours).
|
||||
|
||||
--live-output
|
||||
Prints output in real-time directly on STDOUT.
|
||||
|
||||
--clean-docker-cache
|
||||
Rebuild docker image without using cache.
|
||||
|
||||
--keep-alive
|
||||
Keep the container alive after the application finishes running.
|
||||
|
||||
--keep-model-dir
|
||||
Keep the model directory after the application finishes running.
|
||||
|
||||
--output, -o OUTPUT
|
||||
Output file for the result.
|
||||
|
||||
--log-level LOG_LEVEL
|
||||
Log level for the logger.
|
||||
|
||||
.. _mad-run-tags:
|
||||
|
||||
Tags
|
||||
----
|
||||
|
||||
With the tag functionality, you can select a subset of the models with the corresponding tags to be run. User-specified
|
||||
tags can be specified in ``tags.json`` or with the ``--tags`` argument. If multiple tags are specified, all models that
|
||||
match any specified tag are selected.
|
||||
|
||||
.. note::
|
||||
|
||||
Each model name in ``models.json`` is automatically a tag that can be used to run that model. Tags are also supported
|
||||
in comma-separated form.
|
||||
|
||||
For example, to run the ``pyt_huggingface_bert`` model, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
python3 tools/run_models.py --tags pyt_huggingface_bert
|
||||
|
||||
Or, to run all PyTorch models, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
python3 tools/run_models.py --tags pyt
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
Learn more about MAD's options by visiting `<https://github.com/ROCm/MAD/blob/develop/README.md>`__.
|
||||
@@ -32,6 +32,7 @@ subtrees:
|
||||
- file: how-to/rocm-for-ai/train-a-model.rst
|
||||
- file: how-to/rocm-for-ai/hugging-face-models.rst
|
||||
- file: how-to/rocm-for-ai/deploy-your-model.rst
|
||||
- file: how-to/rocm-for-ai/model-automation-and-dashboarding.rst
|
||||
- file: how-to/rocm-for-hpc/index.rst
|
||||
title: Using ROCm for HPC
|
||||
- file: how-to/llm-fine-tuning-optimization/index.rst
|
||||
|
||||
Reference in New Issue
Block a user