mirror of
https://github.com/SYSTRAN/faster-whisper.git
synced 2026-01-08 13:14:00 -05:00
Bump version to 1.1.0 and update benchmarks (#1161)
* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks
This commit is contained in:
74
README.md
74
README.md
@@ -12,63 +12,53 @@ This implementation is up to 4 times faster than [openai/whisper](https://github
|
||||
|
||||
For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:
|
||||
|
||||
* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258)
|
||||
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362)
|
||||
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[cce6b53e](https://github.com/SYSTRAN/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)
|
||||
* [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/tree/v20240930)
|
||||
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[v1.7.2](https://github.com/ggerganov/whisper.cpp/tree/v1.7.2)
|
||||
* [transformers](https://github.com/huggingface/transformers)@[v4.46.3](https://github.com/huggingface/transformers/tree/v4.46.3)
|
||||
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0)
|
||||
|
||||
### Large-v2 model on GPU
|
||||
|
||||
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
|
||||
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
|
||||
| faster-whisper | int8 | 5 | 59s | 3091MB | 3117MB |
|
||||
| Implementation | Precision | Beam size | Time | VRAM Usage |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| openai/whisper | fp16 | 5 | 2m23s | 4708MB |
|
||||
| whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB |
|
||||
| transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB |
|
||||
| faster-whisper | fp16 | 5 | 1m03s | 4525MB |
|
||||
| faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB |
|
||||
| faster-whisper | int8 | 5 | 59s | 2926MB |
|
||||
| faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB |
|
||||
|
||||
*Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.*
|
||||
### distil-whisper-large-v3 model on GPU
|
||||
|
||||
| Implementation | Precision | Beam size | Time | YT Commons WER |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 |
|
||||
| faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 |
|
||||
|
||||
*GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.*
|
||||
[^1]: transformers OOM for any batch size > 1
|
||||
|
||||
### Small model on CPU
|
||||
|
||||
| Implementation | Precision | Beam size | Time | Max. memory |
|
||||
| Implementation | Precision | Beam size | Time | RAM Usage |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| openai/whisper | fp32 | 5 | 10m31s | 3101MB |
|
||||
| whisper.cpp | fp32 | 5 | 17m42s | 1581MB |
|
||||
| whisper.cpp | fp16 | 5 | 12m39s | 873MB |
|
||||
| faster-whisper | fp32 | 5 | 2m44s | 1675MB |
|
||||
| faster-whisper | int8 | 5 | 2m04s | 995MB |
|
||||
| openai/whisper | fp32 | 5 | 6m58s | 2335MB |
|
||||
| whisper.cpp | fp32 | 5 | 2m05s | 1049MB |
|
||||
| whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB |
|
||||
| faster-whisper | fp32 | 5 | 2m37s | 2257MB |
|
||||
| faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB |
|
||||
| faster-whisper | int8 | 5 | 1m42s | 1477MB |
|
||||
| faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB |
|
||||
|
||||
*Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.*
|
||||
*Executed with 8 threads on an Intel Core i7-12700K.*
|
||||
|
||||
|
||||
### Distil-whisper
|
||||
|
||||
| Implementation | Precision | Beam size | Time | Gigaspeech WER |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| distil-whisper/distil-large-v2 | fp16 | 4 |- | 10.36 |
|
||||
| [faster-distil-large-v2](https://huggingface.co/Systran/faster-distil-whisper-large-v2) | fp16 | 5 | - | 10.28 |
|
||||
| distil-whisper/distil-medium.en | fp16 | 4 | - | 11.21 |
|
||||
| [faster-distil-medium.en](https://huggingface.co/Systran/faster-distil-whisper-medium.en) | fp16 | 5 | - | 11.21 |
|
||||
|
||||
*Executed with CUDA 11.4 on a NVIDIA 3090.*
|
||||
|
||||
<details>
|
||||
<summary>testing details (click to expand)</summary>
|
||||
|
||||
For `distil-whisper/distil-large-v2`, the WER is tested with code sample from [link](https://huggingface.co/distil-whisper/distil-large-v2#evaluation). for `faster-distil-whisper`, the WER is tested with setting:
|
||||
```python
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
model_size = "distil-large-v2"
|
||||
# model_size = "distil-medium.en"
|
||||
# Run on GPU with FP16
|
||||
model = WhisperModel(model_size, device="cuda", compute_type="float16")
|
||||
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en")
|
||||
```
|
||||
</details>
|
||||
|
||||
## Requirements
|
||||
|
||||
* Python 3.8 or greater
|
||||
|
||||
Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.
|
||||
|
||||
### GPU
|
||||
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
"""Version information."""
|
||||
|
||||
__version__ = "1.1.0rc0"
|
||||
__version__ = "1.1.0"
|
||||
|
||||
Reference in New Issue
Block a user