mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
Add QwQ 32B to vllm-benchmark.rst (#4685)
* Add Qwen2 MoE 2.7B to vllm-benchmark-models.yaml * Add QwQ-32B-Preview to vllm-benchmark-models.yaml * add links to performance results words * change "performance validation" to "performance testing" * remove "-Preview" from QwQ-32B * move qwen2 MoE after qwen2 * add TunableOp section * fix formatting * add link to TunableOp doc * add tunableop note * fix vllm-benchmark template * remove cmdline option for --tunableop on * update docker details * remove "training" * remove qwen2
This commit is contained in:
@@ -102,6 +102,12 @@ vllm_benchmark:
|
||||
model_repo: Qwen/Qwen2-72B-Instruct
|
||||
url: https://huggingface.co/Qwen/Qwen2-72B-Instruct
|
||||
precision: float16
|
||||
- model: QwQ-32B
|
||||
mad_tag: pyt_vllm_qwq-32b
|
||||
model_repo: Qwen/QwQ-32B
|
||||
url: https://huggingface.co/Qwen/QwQ-32B
|
||||
precision: float16
|
||||
tunableop: true
|
||||
- group: DBRX
|
||||
tag: dbrx
|
||||
models:
|
||||
|
||||
Reference in New Issue
Block a user