Rename fine-tuning and optimization guide directory and fix index.md (#3242)

* Mv fine-tuning and optimization files

* Reorder index.md

* Rename images directory

* Fix internal links
This commit is contained in:
Peter Park
2024-06-05 08:11:00 -07:00
committed by GitHub
parent 266f502010
commit 6494885359
32 changed files with 32 additions and 32 deletions

View File

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 44 KiB

View File

Before

Width:  |  Height:  |  Size: 112 KiB

After

Width:  |  Height:  |  Size: 112 KiB

View File

Before

Width:  |  Height:  |  Size: 188 KiB

After

Width:  |  Height:  |  Size: 188 KiB

View File

Before

Width:  |  Height:  |  Size: 138 KiB

After

Width:  |  Height:  |  Size: 138 KiB

View File

Before

Width:  |  Height:  |  Size: 62 KiB

After

Width:  |  Height:  |  Size: 62 KiB

View File

Before

Width:  |  Height:  |  Size: 27 KiB

After

Width:  |  Height:  |  Size: 27 KiB

View File

Before

Width:  |  Height:  |  Size: 86 KiB

After

Width:  |  Height:  |  Size: 86 KiB

View File

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 49 KiB

View File

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB

View File

Before

Width:  |  Height:  |  Size: 288 KiB

After

Width:  |  Height:  |  Size: 288 KiB

View File

Before

Width:  |  Height:  |  Size: 153 KiB

After

Width:  |  Height:  |  Size: 153 KiB

View File

Before

Width:  |  Height:  |  Size: 219 KiB

After

Width:  |  Height:  |  Size: 219 KiB

View File

Before

Width:  |  Height:  |  Size: 80 KiB

After

Width:  |  Height:  |  Size: 80 KiB

View File

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 73 KiB

View File

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 28 KiB

View File

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 43 KiB

View File

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 25 KiB

View File

@@ -65,4 +65,4 @@ through the following guides.
* :doc:`rocm-for-ai/index`
* :doc:`fine-tuning-llms/index`
* :doc:`llm-fine-tuning-optimization/index`

View File

@@ -77,7 +77,7 @@ Installing vLLM
The following log message is displayed in your command line indicates that the server is listening for requests.
.. image:: ../../data/how-to/fine-tuning-llms/vllm-single-gpu-log.png
.. image:: ../../data/how-to/llm-fine-tuning-optimization/vllm-single-gpu-log.png
:alt: vLLM API server log message
:align: center

View File

@@ -18,7 +18,7 @@ Attention (GQA), and Multi-Query Attention (MQA). This reduction in memory movem
time-to-first-token (TTFT) latency for large batch sizes and long prompt sequences, thereby enhancing overall
performance.
.. image:: ../../data/how-to/fine-tuning-llms/attention-module.png
.. image:: ../../data/how-to/llm-fine-tuning-optimization/attention-module.png
:alt: Attention module of a large language module utilizing tiling
:align: center
@@ -243,7 +243,7 @@ page describes the options.
Validator,ROCBLAS_VERSION,4.1.0-cefa4a9b-dirty
GemmTunableOp_float_TN,tn_200_100_20,Gemm_Rocblas_32323,0.00669595
.. image:: ../../data/how-to/fine-tuning-llms/tunableop.png
.. image:: ../../data/how-to/llm-fine-tuning-optimization/tunableop.png
:alt: GEMM and TunableOp
:align: center

View File

@@ -31,7 +31,7 @@ Each accelerator or GPU has multiple Compute Units (CUs) and various CUs do comp
can a compute kernel can allocate its task to? For the :doc:`AMD MI300X accelerator <../../reference/gpu-arch-specs>`, the
grid should have at least 1024 thread blocks or workgroups.
.. figure:: ../../data/how-to/fine-tuning-llms/compute-unit.png
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/compute-unit.png
Schematic representation of a CU in the CDNA2 or CDNA3 architecture.
@@ -187,7 +187,7 @@ Kernel occupancy
.. _fine-tuning-llms-occupancy-vgpr-table:
.. figure:: ../../data/how-to/fine-tuning-llms/occupancy-vgpr.png
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/occupancy-vgpr.png
:alt: Occupancy related to VGPR usage in an Instinct MI300X accelerator.
:align: center

View File

@@ -32,7 +32,7 @@ The template parameters of the instance are grouped into four parameter types:
================
### Figure 2
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-template_parameters.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-template_parameters.jpg
The template parameters of the selected GEMM kernel are classified into four groups. These template parameter groups should be defined properly before running the instance.
```
@@ -126,7 +126,7 @@ The row and column, and stride information of input matrices are also passed to
================
### Figure 3
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-kernel_launch.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-kernel_launch.jpg
Templated kernel launching consists of kernel instantiation, making arguments by passing in actual application parameters, creating an invoker, and running the instance through the invoker.
```
@@ -155,7 +155,7 @@ The first operation in the process is to perform the multiplication of input mat
================
### Figure 4
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-operation_flow.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-operation_flow.jpg
Operation flow.
```
@@ -171,7 +171,7 @@ Here, we use [DeviceBatchedGemmMultiD_Xdl](https://github.com/ROCm/composable_ke
================
### Figure 5
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-root_instance.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-root_instance.jpg
Use the DeviceBatchedGemmMultiD_Xdl instance as a root.
```
@@ -421,7 +421,7 @@ Run `python setup.py install` to build and install the extension. It should look
================
### Figure 6
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-compilation.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-compilation.jpg
Compilation and installation of the INT8 kernels.
```
@@ -433,7 +433,7 @@ The implementation architecture of running SmoothQuant models on MI300X GPUs is
================
### Figure 7
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-inference_flow.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-inference_flow.jpg
The implementation architecture of running SmoothQuant models on AMD MI300X accelerators.
```
@@ -459,7 +459,7 @@ Figure 8 shows the performance comparisons between the original FP16 and the Smo
================
### Figure 8
================ -->
```{figure} ../../data/how-to/fine-tuning-llms/ck-comparisons.jpg
```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-comparisons.jpg
Performance comparisons between the original FP16 and the SmoothQuant-quantized INT8 models on a single MI300X accelerator.
```

View File

@@ -41,7 +41,7 @@ The weight update is as follows: :math:`W_{updated} = W + ΔW`.
If the weight matrix :math:`W` contains 7B parameters, then the weight update matrix :math:`ΔW` should also
contain 7B parameters. Therefore, the :math:`ΔW` calculation is computationally and memory intensive.
.. figure:: ../../data/how-to/fine-tuning-llms/weight-update.png
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/weight-update.png
:alt: Weight update diagram
(a) Weight update in regular fine-tuning. (b) Weight update in LoRA where the product of matrix A (:math:`M\times K`)

View File

@@ -38,7 +38,7 @@ You can then visualize and view these metrics using an open-source profile visua
shows transactions denoting the CPU activities that launch GPU kernels while the lower section shows the actual GPU
activities where it processes the ``resnet18`` inferences layer by layer.
.. figure:: ../../data/how-to/fine-tuning-llms/perfetto-trace.svg
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/perfetto-trace.svg
Perfetto trace visualization example.
@@ -100,7 +100,7 @@ analyze bottlenecks and stressors for their computational workloads on AMD Insti
Omniperf collects hardware counters in multiple passes, and will therefore re-run the application during each pass
to collect different sets of metrics.
.. figure:: ../../data/how-to/fine-tuning-llms/omniperf-analysis.png
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omniperf-analysis.png
Omniperf memory chat analysis panel.
@@ -130,7 +130,7 @@ hardware counters are also included.
have the greatest impact on the end-to-end execution of the application and to discover what else is happening on the
system during a performance bottleneck.
.. figure:: ../../data/how-to/fine-tuning-llms/omnitrace-timeline.png
.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omnitrace-timeline.png
Omnitrace timeline trace example.

View File

@@ -110,7 +110,7 @@ Fine-tuning your model
ROCm supports multiple techniques for :ref:`optimizing fine-tuning <fine-tuning-llms-concept-optimizations>`, for
example, LoRA, QLoRA, PEFT, and FSDP.
Learn more about challenges and solutions for model fine-tuning in :doc:`../fine-tuning-llms/index`.
Learn more about challenges and solutions for model fine-tuning in :doc:`../llm-fine-tuning-optimization/index`.
The following developer blogs showcase examples of how to fine-tune a model on an AMD accelerator or GPU.

View File

@@ -34,16 +34,16 @@ Our documentation is organized into the following categories:
* {doc}`Quick start guide<rocm-install-on-linux:tutorial/quick-start>`
* {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`
* {doc}`Package manager integration<rocm-install-on-linux:how-to/native-install/package-manager-integration>`
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
* Windows
* {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
* {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
* [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
:::
:::{grid-item-card}
@@ -92,7 +92,7 @@ Our documentation is organized into the following categories:
:padding: 2
* [Using ROCm for AI](./how-to/rocm-for-ai/index.rst)
* [Fine-tuning LLMs and inference optimization](./how-to/fine-tuning-llms/index.rst)
* [Fine-tuning LLMs and inference optimization](./how-to/llm-fine-tuning-optimization/index.rst)
* [System tuning for various architectures](./how-to/tuning-guides.md)
* [MI100](./how-to/tuning-guides/mi100.md)
* [MI200](./how-to/tuning-guides/mi200.md)

View File

@@ -58,27 +58,27 @@ subtrees:
- file: how-to/rocm-for-ai/train-a-model.rst
- file: how-to/rocm-for-ai/hugging-face-models.rst
- file: how-to/rocm-for-ai/deploy-your-model.rst
- file: how-to/fine-tuning-llms/index.rst
- file: how-to/llm-fine-tuning-optimization/index.rst
title: Fine-tuning LLMs and inference optimization
subtrees:
- entries:
- file: how-to/fine-tuning-llms/overview.rst
- file: how-to/llm-fine-tuning-optimization/overview.rst
title: Conceptual overview
- file: how-to/fine-tuning-llms/fine-tuning-and-inference.rst
- file: how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.rst
subtrees:
- entries:
- file: how-to/fine-tuning-llms/single-gpu-fine-tuning-and-inference.rst
- file: how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.rst
title: Using a single accelerator
- file: how-to/fine-tuning-llms/multi-gpu-fine-tuning-and-inference.rst
- file: how-to/llm-fine-tuning-optimization/multi-gpu-fine-tuning-and-inference.rst
title: Using multiple accelerators
- file: how-to/fine-tuning-llms/model-quantization.rst
- file: how-to/fine-tuning-llms/model-acceleration-libraries.rst
- file: how-to/fine-tuning-llms/llm-inference-frameworks.rst
- file: how-to/fine-tuning-llms/optimizing-with-composable-kernel.md
- file: how-to/llm-fine-tuning-optimization/model-quantization.rst
- file: how-to/llm-fine-tuning-optimization/model-acceleration-libraries.rst
- file: how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
- file: how-to/llm-fine-tuning-optimization/optimizing-with-composable-kernel.md
title: Optimizing with Composable Kernel
- file: how-to/fine-tuning-llms/optimizing-triton-kernel.rst
- file: how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.rst
title: Optimizing Triton kernels
- file: how-to/fine-tuning-llms/profiling-and-debugging.rst
- file: how-to/llm-fine-tuning-optimization/profiling-and-debugging.rst
- file: how-to/tuning-guides.md
title: System optimization
subtrees: