add MAD page

This commit is contained in:
Peter Park
2024-09-18 14:22:14 -04:00
parent 1e0d3da98c
commit c81d0f3b0a
2 changed files with 160 additions and 0 deletions

View File

@@ -0,0 +1,159 @@
.. meta::
:description: Discover and run deep learning models with AMD MAD -- Model Automation and Dashboarding tool.
:keywords: AI, LLM, machine, dashboarding, zoo,
************************
Running models using MAD
************************
The AMD Model Automation and Dashboarding (MAD) tool brings together an AI model zoo with automated execution across
various GPU architectures. It facilitates performance tracking by including mechanisms for maintaining historical
performance data and generating dashboards for analysis. MAD's source code repository and full documentation are located
at `<https://github.com/ROCm/MAD>`__.
MAD pulls various models from their repositories and tests their performance inside ROCm Docker images. It is an index
of deep learning models that have been trained to get the best reproducible accuracy and performance with AMDs ROCm
software stack running on AMD GPUs.
Use MAD to:
* Try new models,
* Compare performance between patches or architectures, and
* Track functionality and performance over time.
Getting started with MAD
========================
Refer to the steps to set up your host computer with :doc:`ROCm <rocm:index>` here. Follow the detailed
:doc:`installation instructions <rocm-install-on-linux:install/detailed-install>` for Linux-based platforms.
ROCm Docker images
------------------
You can find ROCm Docker images for PyTorch and TensorFlow on Docker Hub at
:fab:`docker` `rocm/pytorch <https://hub.docker.com/r/rocm/pytorch>`_ and
:fab:`docker` `rocm/tensorflow <https://hub.docker.com/r/rocm/tensorflow>`_.
AMD publishes a unified Docker image at :fab:`docker` `rocm/vllm <https://hub.docker.com/r/rocm/vllm>`_ that packages
together vLLM and PyTorch for the AMD Instinct™ MI300X accelerator. This enables users to quickly validate the expected
inference performance numbers on the MI300X. This Docker image includes:
- ROCm
- vLLM
- PyTorch
- Tuning files (.csv format)
See `<https://github.com/ROCm/MAD/tree/develop/benchmark/vllm>`__ for more information.
.. _mad-run-locally:
Using MAD to run models locally
===============================
The following describes MAD's basic functionalities.
1. Clone the `MAD repository <https://github.com/ROCm/MAD>`_ to a local directory and install the required packages
on the host machine. For example:
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip3 install -r requirements.txt
2. Using the ``tools/run_models.py`` script, you can run and collect performance results for all models in
``models.json`` locally on a Docker host.
``run_models.py`` is the main MAD command line interface for running models locally. While the tool has many options,
running any single model is very easy. To run a model, look for its name or tag in the ``models.json`` and pass it to
``run_models.py`` in the form of:
.. code-block:: shell
tools/run_models.py [-h] [--model_name MODEL_NAME] [--timeout TIMEOUT] [--live_output] [--clean_docker_cache] [--keep_alive] [--keep_model_dir] [-o OUTPUT] [--log_level LOG_LEVEL]
See :ref:`mad-run-args` for the list of options and their descriptions.
For each model in ``models.json``, the script:
* Builds Docker images associated with each model. The images are named
``ci-$(model_name)``, and are not removed after the script completes.
* Starts the Docker container, with name, ``container_$(model_name)``.
The container should automatically be stopped and removed whenever
the script exits.
* Clones the git ``url``, and runs the ``scripts``.
* Compiles the final ``perf.csv`` and ``perf.html``.
.. _mad-run-args:
Arguments
---------
--help, -h
Show this help message and exit
--tags TAGS
Tags to run (can be multiple). Overrides ``tags.json``. See :ref:`mad-run-tags`.
--model-name MODEL_NAME
Model name to run the application.
--timeout TIMEOUT
Timeout for the application running model in seconds, default timeout of 7200 (2 hours).
--live-output
Prints output in real-time directly on STDOUT.
--clean-docker-cache
Rebuild docker image without using cache.
--keep-alive
Keep the container alive after the application finishes running.
--keep-model-dir
Keep the model directory after the application finishes running.
--output, -o OUTPUT
Output file for the result.
--log-level LOG_LEVEL
Log level for the logger.
.. _mad-run-tags:
Tags
----
With the tag functionality, you can select a subset of the models with the corresponding tags to be run. User-specified
tags can be specified in ``tags.json`` or with the ``--tags`` argument. If multiple tags are specified, all models that
match any specified tag are selected.
.. note::
Each model name in ``models.json`` is automatically a tag that can be used to run that model. Tags are also supported
in comma-separated form.
For example, to run the ``pyt_huggingface_bert`` model, use:
.. code-block:: shell
python3 tools/run_models.py --tags pyt_huggingface_bert
Or, to run all PyTorch models, use:
.. code-block:: shell
python3 tools/run_models.py --tags pyt
.. note::
Learn more about MAD's options by visiting `<https://github.com/ROCm/MAD/blob/develop/README.md>`__.

View File

@@ -32,6 +32,7 @@ subtrees:
- file: how-to/rocm-for-ai/train-a-model.rst
- file: how-to/rocm-for-ai/hugging-face-models.rst
- file: how-to/rocm-for-ai/deploy-your-model.rst
- file: how-to/rocm-for-ai/model-automation-and-dashboarding.rst
- file: how-to/rocm-for-hpc/index.rst
title: Using ROCm for HPC
- file: how-to/llm-fine-tuning-optimization/index.rst