Merge pull request #4129 from peterjunpark/docs/6.3.0

[6.3] Add @hongxiayang updates to MI300X workload tuning guide (#4123)
2026-01-10 23:28:03 -05:00 · 2024-12-06 12:30:08 -05:00
parent 7b57247b9a 9f6757b71d
commit 31d24eb5cc
5 changed files with 618 additions and 299 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -159,6 +159,7 @@ HWS
 Haswell
 Higgs
 Hyperparameters
+Huggingface
 ICD
 ICV
 IDE
@@ -381,6 +382,7 @@ TCR
 TF
 TFLOPS
 TP
+TPS
 TPU
 TPUs
 TSME
@@ -457,10 +459,12 @@ api
 atmi
 atomics
 autogenerated
+autotune
 avx
 awk
 backend
 backends
+benchmarked
 benchmarking
 bfloat
 bilinear
@@ -530,6 +534,7 @@ disambiguates
 distro
 distros
 dkms
+dtype
 el
 embeddings
 enablement
@@ -562,6 +567,7 @@ heterogenous
 hipBLAS
 hipBLASLt
 hipBLASLt's
+hipblaslt
 hipCUB
 hipFFT
 hipLIB
@@ -605,7 +611,9 @@ ipo
 jax
 kdb
 kfd
+kv
 latencies
+len
 libfabric
 libjpeg
 libs
@@ -631,6 +639,7 @@ mutex
 mvffr
 namespace
 namespaces
+num
 numref
 ocl
 opencl
@@ -726,7 +735,9 @@ runtimes
 sL
 scalability
 scalable
+seealso
 sendmsg
+seqs
 serializers
 shader
 sharding
@@ -767,6 +778,7 @@ txt
 uarch
 uncached
 uncorrectable
+underoptimized
 unhandled
 uninstallation
 unmapped
--- a/docs/data/how-to/tuning-guides/hipblaslt_auto_tuning_output_files.png
+++ b/docs/data/how-to/tuning-guides/hipblaslt_auto_tuning_output_files.png
--- a/docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png
+++ b/docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png
--- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
@@ -135,11 +135,13 @@ Installing vLLM

            {"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}

-Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
+.. seealso::

-ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
-on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
-format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
+   See :ref:`mi300x-vllm-optimization` for performance optimization tips.
+
+   ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
+   on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
+   format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.

 .. _fine-tuning-llms-tgi:

--- a/docs/how-to/tuning-guides/mi300x/workload.rst
+++ b/docs/how-to/tuning-guides/mi300x/workload.rst