Merge pull request #4129 from peterjunpark/docs/6.3.0

[6.3] Add @hongxiayang updates to MI300X workload tuning guide (#4123)
This commit is contained in:
Peter Park
2024-12-06 12:30:08 -05:00
committed by GitHub
5 changed files with 618 additions and 299 deletions

View File

@@ -159,6 +159,7 @@ HWS
Haswell
Higgs
Hyperparameters
Huggingface
ICD
ICV
IDE
@@ -381,6 +382,7 @@ TCR
TF
TFLOPS
TP
TPS
TPU
TPUs
TSME
@@ -457,10 +459,12 @@ api
atmi
atomics
autogenerated
autotune
avx
awk
backend
backends
benchmarked
benchmarking
bfloat
bilinear
@@ -530,6 +534,7 @@ disambiguates
distro
distros
dkms
dtype
el
embeddings
enablement
@@ -562,6 +567,7 @@ heterogenous
hipBLAS
hipBLASLt
hipBLASLt's
hipblaslt
hipCUB
hipFFT
hipLIB
@@ -605,7 +611,9 @@ ipo
jax
kdb
kfd
kv
latencies
len
libfabric
libjpeg
libs
@@ -631,6 +639,7 @@ mutex
mvffr
namespace
namespaces
num
numref
ocl
opencl
@@ -726,7 +735,9 @@ runtimes
sL
scalability
scalable
seealso
sendmsg
seqs
serializers
shader
sharding
@@ -767,6 +778,7 @@ txt
uarch
uncached
uncorrectable
underoptimized
unhandled
uninstallation
unmapped

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

View File

@@ -135,11 +135,13 @@ Installing vLLM
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
.. seealso::
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
See :ref:`mi300x-vllm-optimization` for performance optimization tips.
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
.. _fine-tuning-llms-tgi:

File diff suppressed because it is too large Load Diff