Add ArcticInference doc (#9492)

This commit is contained in:
Graham Neubig
2025-07-01 14:15:13 -04:00
committed by GitHub
parent 6da7e051be
commit e05e627957

View File

@@ -175,6 +175,10 @@ vllm serve mistralai/Devstral-Small-2505 \
--enable-prefix-caching
```
If you are interested in further improved inference speed, you can also try Snowflake's version
of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/),
which can achieve up to 2x speedup in some cases.
### Run OpenHands (Alternative Backends)
#### Using Docker