mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 23:28:03 -05:00
Merge pull request #4129 from peterjunpark/docs/6.3.0
[6.3] Add @hongxiayang updates to MI300X workload tuning guide (#4123)
This commit is contained in:
@@ -159,6 +159,7 @@ HWS
|
||||
Haswell
|
||||
Higgs
|
||||
Hyperparameters
|
||||
Huggingface
|
||||
ICD
|
||||
ICV
|
||||
IDE
|
||||
@@ -381,6 +382,7 @@ TCR
|
||||
TF
|
||||
TFLOPS
|
||||
TP
|
||||
TPS
|
||||
TPU
|
||||
TPUs
|
||||
TSME
|
||||
@@ -457,10 +459,12 @@ api
|
||||
atmi
|
||||
atomics
|
||||
autogenerated
|
||||
autotune
|
||||
avx
|
||||
awk
|
||||
backend
|
||||
backends
|
||||
benchmarked
|
||||
benchmarking
|
||||
bfloat
|
||||
bilinear
|
||||
@@ -530,6 +534,7 @@ disambiguates
|
||||
distro
|
||||
distros
|
||||
dkms
|
||||
dtype
|
||||
el
|
||||
embeddings
|
||||
enablement
|
||||
@@ -562,6 +567,7 @@ heterogenous
|
||||
hipBLAS
|
||||
hipBLASLt
|
||||
hipBLASLt's
|
||||
hipblaslt
|
||||
hipCUB
|
||||
hipFFT
|
||||
hipLIB
|
||||
@@ -605,7 +611,9 @@ ipo
|
||||
jax
|
||||
kdb
|
||||
kfd
|
||||
kv
|
||||
latencies
|
||||
len
|
||||
libfabric
|
||||
libjpeg
|
||||
libs
|
||||
@@ -631,6 +639,7 @@ mutex
|
||||
mvffr
|
||||
namespace
|
||||
namespaces
|
||||
num
|
||||
numref
|
||||
ocl
|
||||
opencl
|
||||
@@ -726,7 +735,9 @@ runtimes
|
||||
sL
|
||||
scalability
|
||||
scalable
|
||||
seealso
|
||||
sendmsg
|
||||
seqs
|
||||
serializers
|
||||
shader
|
||||
sharding
|
||||
@@ -767,6 +778,7 @@ txt
|
||||
uarch
|
||||
uncached
|
||||
uncorrectable
|
||||
underoptimized
|
||||
unhandled
|
||||
uninstallation
|
||||
unmapped
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 30 KiB |
BIN
docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png
Normal file
BIN
docs/data/how-to/tuning-guides/hipblaslt_yaml_template.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 129 KiB |
@@ -135,11 +135,13 @@ Installing vLLM
|
||||
|
||||
{"text":["What is AMD Instinct?\nAmd Instinct is a brand new line of high-performance computing (HPC) processors from Advanced Micro Devices (AMD). These processors are designed to deliver unparalleled performance for HPC workloads, including scientific simulations, data analytics, and machine learning.\nThe Instinct lineup includes a range of processors, from the entry-level Inst"]}
|
||||
|
||||
Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
|
||||
.. seealso::
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
|
||||
See :ref:`mi300x-vllm-optimization` for performance optimization tips.
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
|
||||
format. For more information, see :doc:`/how-to/performance-validation/mi300x/vllm-benchmark`.
|
||||
|
||||
.. _fine-tuning-llms-tgi:
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user