add MAD page

2026-01-09 14:48:06 -05:00 · 2024-09-18 14:22:14 -04:00
parent 1e0d3da98c
commit c81d0f3b0a
2 changed files with 160 additions and 0 deletions
--- a/docs/how-to/rocm-for-ai/model-automation-and-dashboarding.rst
+++ b/docs/how-to/rocm-for-ai/model-automation-and-dashboarding.rst
@@ -0,0 +1,159 @@
+.. meta::
+   :description: Discover and run deep learning models with AMD MAD -- Model Automation and Dashboarding tool.
+   :keywords: AI, LLM, machine, dashboarding, zoo,
+
+************************
+Running models using MAD
+************************
+
+The AMD Model Automation and Dashboarding (MAD) tool brings together an AI model zoo with automated execution across
+various GPU architectures. It facilitates performance tracking by including mechanisms for maintaining historical
+performance data and generating dashboards for analysis. MAD's source code repository and full documentation are located
+at `<https://github.com/ROCm/MAD>`__.
+
+MAD pulls various models from their repositories and tests their performance inside ROCm Docker images. It is an index
+of deep learning models that have been trained to get the best reproducible accuracy and performance with AMD’s ROCm
+software stack running on AMD GPUs.
+
+Use MAD to:
+
+*  Try new models,
+
+*  Compare performance between patches or architectures, and
+
+*  Track functionality and performance over time.
+
+Getting started with MAD
+========================
+
+Refer to the steps to set up your host computer with :doc:`ROCm <rocm:index>` here. Follow the detailed
+:doc:`installation instructions <rocm-install-on-linux:install/detailed-install>` for Linux-based platforms.
+
+ROCm Docker images
+------------------
+
+You can find ROCm Docker images for PyTorch and TensorFlow on Docker Hub at
+:fab:`docker` `rocm/pytorch <https://hub.docker.com/r/rocm/pytorch>`_ and
+:fab:`docker` `rocm/tensorflow <https://hub.docker.com/r/rocm/tensorflow>`_.
+
+AMD publishes a unified Docker image at :fab:`docker` `rocm/vllm <https://hub.docker.com/r/rocm/vllm>`_ that packages
+together vLLM and PyTorch for the AMD Instinct™ MI300X accelerator. This enables users to quickly validate the expected
+inference performance numbers on the MI300X. This Docker image includes:
+
+- ROCm
+
+- vLLM
+
+- PyTorch
+
+- Tuning files (.csv format)
+
+See `<https://github.com/ROCm/MAD/tree/develop/benchmark/vllm>`__ for more information.
+
+.. _mad-run-locally:
+
+Using MAD to run models locally
+===============================
+
+The following describes MAD's basic functionalities.
+
+1. Clone the `MAD repository <https://github.com/ROCm/MAD>`_ to a local directory and install the required packages
+   on the host machine. For example:
+
+   .. code-block:: shell
+
+      git clone https://github.com/ROCm/MAD
+      cd MAD
+      pip3 install -r requirements.txt
+
+2. Using the ``tools/run_models.py`` script, you can run and collect performance results for all models in
+   ``models.json`` locally on a Docker host.
+
+   ``run_models.py`` is the main MAD command line interface for running models locally. While the tool has many options,
+   running any single model is very easy. To run a model, look for its name or tag in the ``models.json`` and pass it to
+   ``run_models.py`` in the form of:
+
+   .. code-block:: shell
+
+      tools/run_models.py [-h] [--model_name MODEL_NAME] [--timeout TIMEOUT] [--live_output] [--clean_docker_cache] [--keep_alive] [--keep_model_dir] [-o OUTPUT] [--log_level LOG_LEVEL]
+
+   See :ref:`mad-run-args` for the list of options and their descriptions.
+
+For each model in ``models.json``, the script:
+
+* Builds Docker images associated with each model. The images are named
+  ``ci-$(model_name)``, and are not removed after the script completes.
+
+* Starts the Docker container, with name, ``container_$(model_name)``.
+  The container should automatically be stopped and removed whenever
+  the script exits.
+
+* Clones the git ``url``, and runs the ``scripts``.
+
+* Compiles the final ``perf.csv`` and ``perf.html``.
+
+.. _mad-run-args:
+
+Arguments
+---------
+
+--help, -h
+   Show this help message and exit
+
+--tags TAGS
+   Tags to run (can be multiple). Overrides ``tags.json``. See :ref:`mad-run-tags`.
+
+--model-name MODEL_NAME
+   Model name to run the application.
+
+--timeout TIMEOUT
+   Timeout for the application running model in seconds, default timeout of 7200 (2 hours).
+
+--live-output
+   Prints output in real-time directly on STDOUT.
+
+--clean-docker-cache
+   Rebuild docker image without using cache.
+
+--keep-alive
+   Keep the container alive after the application finishes running.
+
+--keep-model-dir
+   Keep the model directory after the application finishes running.
+
+--output, -o OUTPUT
+   Output file for the result.
+
+--log-level LOG_LEVEL
+   Log level for the logger.
+
+.. _mad-run-tags:
+
+Tags
+----
+
+With the tag functionality, you can select a subset of the models with the corresponding tags to be run. User-specified
+tags can be specified in ``tags.json`` or with the ``--tags`` argument. If multiple tags are specified, all models that
+match any specified tag are selected.
+
+.. note::
+
+   Each model name in ``models.json`` is automatically a tag that can be used to run that model. Tags are also supported
+   in comma-separated form.
+
+For example, to run the ``pyt_huggingface_bert`` model, use:
+
+.. code-block:: shell
+
+   python3 tools/run_models.py --tags pyt_huggingface_bert
+
+Or, to run all PyTorch models, use:
+
+.. code-block:: shell
+
+   python3 tools/run_models.py --tags pyt
+
+
+.. note::
+
+   Learn more about MAD's options by visiting `<https://github.com/ROCm/MAD/blob/develop/README.md>`__.
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -32,6 +32,7 @@ subtrees:
      - file: how-to/rocm-for-ai/train-a-model.rst
      - file: how-to/rocm-for-ai/hugging-face-models.rst
      - file: how-to/rocm-for-ai/deploy-your-model.rst
+      - file: how-to/rocm-for-ai/model-automation-and-dashboarding.rst
  - file: how-to/rocm-for-hpc/index.rst
    title: Using ROCm for HPC
  - file: how-to/llm-fine-tuning-optimization/index.rst