Rename fine-tuning and optimization guide directory and fix index.md (#3242)

* Mv fine-tuning and optimization files * Reorder index.md * Rename images directory * Fix internal links
2026-04-05 03:01:17 -04:00 · 2024-06-05 08:11:00 -07:00
parent 266f502010
commit 6494885359
32 changed files with 32 additions and 32 deletions
--- a/docs/data/how-to/llm-fine-tuning-optimization/attention-module.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/attention-module.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-comparisons.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-comparisons.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-compilation.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-compilation.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-inference_flow.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-inference_flow.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-kernel_launch.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-kernel_launch.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-operation_flow.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-operation_flow.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-root_instance.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-root_instance.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/ck-template_parameters.jpg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/ck-template_parameters.jpg
--- a/docs/data/how-to/llm-fine-tuning-optimization/compute-unit.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/compute-unit.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/occupancy-vgpr.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/occupancy-vgpr.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/omniperf-analysis.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/omniperf-analysis.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/omnitrace-timeline.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/omnitrace-timeline.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/perfetto-trace.svg
+++ b/docs/data/how-to/llm-fine-tuning-optimization/perfetto-trace.svg
--- a/docs/data/how-to/llm-fine-tuning-optimization/profiling-perfetto-ui.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/profiling-perfetto-ui.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/tunableop.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/tunableop.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/vllm-single-gpu-log.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/vllm-single-gpu-log.png
--- a/docs/data/how-to/llm-fine-tuning-optimization/weight-update.png
+++ b/docs/data/how-to/llm-fine-tuning-optimization/weight-update.png
--- a/docs/how-to/deep-learning-rocm.rst
+++ b/docs/how-to/deep-learning-rocm.rst
@@ -65,4 +65,4 @@ through the following guides.

 * :doc:`rocm-for-ai/index`

-* :doc:`fine-tuning-llms/index`
+* :doc:`llm-fine-tuning-optimization/index`
--- a/docs/how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.rst
--- a/docs/how-to/llm-fine-tuning-optimization/index.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/index.rst
--- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
@@ -77,7 +77,7 @@ Installing vLLM

         The following log message is displayed in your command line indicates that the server is listening for requests.

-         .. image:: ../../data/how-to/fine-tuning-llms/vllm-single-gpu-log.png
+         .. image:: ../../data/how-to/llm-fine-tuning-optimization/vllm-single-gpu-log.png
            :alt: vLLM API server log message
            :align: center

--- a/docs/how-to/llm-fine-tuning-optimization/model-acceleration-libraries.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/model-acceleration-libraries.rst
@@ -18,7 +18,7 @@ Attention (GQA), and Multi-Query Attention (MQA). This reduction in memory movem
 time-to-first-token (TTFT) latency for large batch sizes and long prompt sequences, thereby enhancing overall
 performance.

-.. image:: ../../data/how-to/fine-tuning-llms/attention-module.png
+.. image:: ../../data/how-to/llm-fine-tuning-optimization/attention-module.png
   :alt: Attention module of a large language module utilizing tiling
   :align: center

@@ -243,7 +243,7 @@ page describes the options.
   Validator,ROCBLAS_VERSION,4.1.0-cefa4a9b-dirty
   GemmTunableOp_float_TN,tn_200_100_20,Gemm_Rocblas_32323,0.00669595

-.. image:: ../../data/how-to/fine-tuning-llms/tunableop.png
+.. image:: ../../data/how-to/llm-fine-tuning-optimization/tunableop.png
   :alt: GEMM and TunableOp
   :align: center

--- a/docs/how-to/llm-fine-tuning-optimization/model-quantization.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/model-quantization.rst
--- a/docs/how-to/llm-fine-tuning-optimization/multi-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/multi-gpu-fine-tuning-and-inference.rst
--- a/docs/how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.rst
@@ -31,7 +31,7 @@ Each accelerator or GPU has multiple Compute Units (CUs) and various CUs do comp
 can a compute kernel can allocate its task to? For the :doc:`AMD MI300X accelerator <../../reference/gpu-arch-specs>`, the
 grid should have at least 1024 thread blocks or workgroups.

-.. figure:: ../../data/how-to/fine-tuning-llms/compute-unit.png
+.. figure:: ../../data/how-to/llm-fine-tuning-optimization/compute-unit.png

   Schematic representation of a CU in the CDNA2 or CDNA3 architecture.

@@ -187,7 +187,7 @@ Kernel occupancy

 .. _fine-tuning-llms-occupancy-vgpr-table:

-.. figure:: ../../data/how-to/fine-tuning-llms/occupancy-vgpr.png
+.. figure:: ../../data/how-to/llm-fine-tuning-optimization/occupancy-vgpr.png
   :alt: Occupancy related to VGPR usage in an Instinct MI300X accelerator.
   :align: center

--- a/docs/how-to/llm-fine-tuning-optimization/optimizing-with-composable-kernel.md
+++ b/docs/how-to/llm-fine-tuning-optimization/optimizing-with-composable-kernel.md
@@ -32,7 +32,7 @@ The template parameters of the instance are grouped into four parameter types:
 ================
 ### Figure 2
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-template_parameters.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-template_parameters.jpg
 The template parameters of the selected GEMM kernel are classified into four groups. These template parameter groups should be defined properly before running the instance.
 ```

@@ -126,7 +126,7 @@ The row and column, and stride information of input matrices are also passed to
 ================
 ### Figure 3
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-kernel_launch.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-kernel_launch.jpg
 Templated kernel launching consists of kernel instantiation, making arguments by passing in actual application parameters, creating an invoker, and running the instance through the invoker.
 ```

@@ -155,7 +155,7 @@ The first operation in the process is to perform the multiplication of input mat
 ================
 ### Figure 4
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-operation_flow.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-operation_flow.jpg
 Operation flow.
 ```

@@ -171,7 +171,7 @@ Here, we use [DeviceBatchedGemmMultiD_Xdl](https://github.com/ROCm/composable_ke
 ================
 ### Figure 5
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-root_instance.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-root_instance.jpg
 Use the ‘DeviceBatchedGemmMultiD_Xdl’ instance as a root.
 ```

@@ -421,7 +421,7 @@ Run `python setup.py install` to build and install the extension. It should look
 ================
 ### Figure 6
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-compilation.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-compilation.jpg
 Compilation and installation of the INT8 kernels.
 ```

@@ -433,7 +433,7 @@ The implementation architecture of running SmoothQuant models on MI300X GPUs is
 ================
 ### Figure 7
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-inference_flow.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-inference_flow.jpg
 The implementation architecture of running SmoothQuant models on AMD MI300X accelerators.
 ```

@@ -459,7 +459,7 @@ Figure 8 shows the performance comparisons between the original FP16 and the Smo
 ================
 ### Figure 8
 ================ -->
-```{figure} ../../data/how-to/fine-tuning-llms/ck-comparisons.jpg
+```{figure} ../../data/how-to/llm-fine-tuning-optimization/ck-comparisons.jpg
 Performance comparisons between the original FP16 and the SmoothQuant-quantized INT8 models on a single MI300X accelerator.
 ```

--- a/docs/how-to/llm-fine-tuning-optimization/overview.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/overview.rst
@@ -41,7 +41,7 @@ The weight update is as follows: :math:`W_{updated} = W + ΔW`.
 If the weight matrix :math:`W` contains 7B parameters, then the weight update matrix :math:`ΔW` should also
 contain 7B parameters. Therefore, the :math:`ΔW` calculation is computationally and memory intensive.

-.. figure:: ../../data/how-to/fine-tuning-llms/weight-update.png
+.. figure:: ../../data/how-to/llm-fine-tuning-optimization/weight-update.png
   :alt: Weight update diagram

   (a) Weight update in regular fine-tuning. (b) Weight update in LoRA where the product of matrix A (:math:`M\times K`)
--- a/docs/how-to/llm-fine-tuning-optimization/profiling-and-debugging.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/profiling-and-debugging.rst
@@ -38,7 +38,7 @@ You can then visualize and view these metrics using an open-source profile visua
   shows transactions denoting the CPU activities that launch GPU kernels while the lower section shows the actual GPU
   activities where it processes the ``resnet18`` inferences layer by layer. 

-   .. figure:: ../../data/how-to/fine-tuning-llms/perfetto-trace.svg
+   .. figure:: ../../data/how-to/llm-fine-tuning-optimization/perfetto-trace.svg
      
      Perfetto trace visualization example.

@@ -100,7 +100,7 @@ analyze bottlenecks and stressors for their computational workloads on AMD Insti
   Omniperf collects hardware counters in multiple passes, and will therefore re-run the application during each pass
   to collect different sets of metrics.

-.. figure:: ../../data/how-to/fine-tuning-llms/omniperf-analysis.png
+.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omniperf-analysis.png

   Omniperf memory chat analysis panel.

@@ -130,7 +130,7 @@ hardware counters are also included.
   have the greatest impact on the end-to-end execution of the application and to discover what else is happening on the
   system during a performance bottleneck.

-.. figure:: ../../data/how-to/fine-tuning-llms/omnitrace-timeline.png
+.. figure:: ../../data/how-to/llm-fine-tuning-optimization/omnitrace-timeline.png

   Omnitrace timeline trace example.

--- a/docs/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.rst
--- a/docs/how-to/rocm-for-ai/train-a-model.rst
+++ b/docs/how-to/rocm-for-ai/train-a-model.rst
@@ -110,7 +110,7 @@ Fine-tuning your model
 ROCm supports multiple techniques for :ref:`optimizing fine-tuning <fine-tuning-llms-concept-optimizations>`, for
 example, LoRA, QLoRA, PEFT, and FSDP.

-Learn more about challenges and solutions for model fine-tuning in :doc:`../fine-tuning-llms/index`.
+Learn more about challenges and solutions for model fine-tuning in :doc:`../llm-fine-tuning-optimization/index`.

 The following developer blogs showcase examples of how to fine-tune a model on an AMD accelerator or GPU.

--- a/docs/index.md
+++ b/docs/index.md
@@ -34,16 +34,16 @@ Our documentation is organized into the following categories:
  * {doc}`Quick start guide<rocm-install-on-linux:tutorial/quick-start>`
  * {doc}`Linux install guide<rocm-install-on-linux:how-to/native-install/index>`
  * {doc}`Package manager integration<rocm-install-on-linux:how-to/native-install/package-manager-integration>`
+  * {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
+  * {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
 * Windows
  * {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
  * {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
 * [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
-  * {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
  * {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
  * {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
  * {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
  * {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
-  * {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
 :::

 :::{grid-item-card}
@@ -92,7 +92,7 @@ Our documentation is organized into the following categories:
 :padding: 2

 * [Using ROCm for AI](./how-to/rocm-for-ai/index.rst)
-* [Fine-tuning LLMs and inference optimization](./how-to/fine-tuning-llms/index.rst)
+* [Fine-tuning LLMs and inference optimization](./how-to/llm-fine-tuning-optimization/index.rst)
 * [System tuning for various architectures](./how-to/tuning-guides.md)
  * [MI100](./how-to/tuning-guides/mi100.md)
  * [MI200](./how-to/tuning-guides/mi200.md)
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -58,27 +58,27 @@ subtrees:
      - file: how-to/rocm-for-ai/train-a-model.rst
      - file: how-to/rocm-for-ai/hugging-face-models.rst
      - file: how-to/rocm-for-ai/deploy-your-model.rst
-  - file: how-to/fine-tuning-llms/index.rst
+  - file: how-to/llm-fine-tuning-optimization/index.rst
    title: Fine-tuning LLMs and inference optimization
    subtrees:
    - entries:
-      - file: how-to/fine-tuning-llms/overview.rst
+      - file: how-to/llm-fine-tuning-optimization/overview.rst
        title: Conceptual overview
-      - file: how-to/fine-tuning-llms/fine-tuning-and-inference.rst
+      - file: how-to/llm-fine-tuning-optimization/fine-tuning-and-inference.rst
        subtrees:
        - entries:
-          - file: how-to/fine-tuning-llms/single-gpu-fine-tuning-and-inference.rst
+          - file: how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.rst
            title: Using a single accelerator
-          - file: how-to/fine-tuning-llms/multi-gpu-fine-tuning-and-inference.rst
+          - file: how-to/llm-fine-tuning-optimization/multi-gpu-fine-tuning-and-inference.rst
            title: Using multiple accelerators
-      - file: how-to/fine-tuning-llms/model-quantization.rst
-      - file: how-to/fine-tuning-llms/model-acceleration-libraries.rst
-      - file: how-to/fine-tuning-llms/llm-inference-frameworks.rst
-      - file: how-to/fine-tuning-llms/optimizing-with-composable-kernel.md
+      - file: how-to/llm-fine-tuning-optimization/model-quantization.rst
+      - file: how-to/llm-fine-tuning-optimization/model-acceleration-libraries.rst
+      - file: how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
+      - file: how-to/llm-fine-tuning-optimization/optimizing-with-composable-kernel.md
        title: Optimizing with Composable Kernel
-      - file: how-to/fine-tuning-llms/optimizing-triton-kernel.rst
+      - file: how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.rst
        title: Optimizing Triton kernels
-      - file: how-to/fine-tuning-llms/profiling-and-debugging.rst
+      - file: how-to/llm-fine-tuning-optimization/profiling-and-debugging.rst
  - file: how-to/tuning-guides.md
    title: System optimization
    subtrees: