Bump version to 1.1.0 and update benchmarks (#1161)

* update version

* Update CPU benchmarks

* Updated GPU benchmarks

* ..

* more gpu benchmarks
This commit is contained in:
Mahmoud Ashraf
2024-11-21 18:22:01 +02:00
committed by GitHub
parent 08f6900217
commit 97a4785fa1
2 changed files with 33 additions and 43 deletions

View File

@@ -12,63 +12,53 @@ This implementation is up to 4 times faster than [openai/whisper](https://github
For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations: For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:
* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258) * [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/tree/v20240930)
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362) * [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[v1.7.2](https://github.com/ggerganov/whisper.cpp/tree/v1.7.2)
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[cce6b53e](https://github.com/SYSTRAN/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e) * [transformers](https://github.com/huggingface/transformers)@[v4.46.3](https://github.com/huggingface/transformers/tree/v4.46.3)
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0)
### Large-v2 model on GPU ### Large-v2 model on GPU
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory | | Implementation | Precision | Beam size | Time | VRAM Usage |
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB | | openai/whisper | fp16 | 5 | 2m23s | 4708MB |
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB | | whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB |
| faster-whisper | int8 | 5 | 59s | 3091MB | 3117MB | | transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB |
| faster-whisper | fp16 | 5 | 1m03s | 4525MB |
| faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB |
| faster-whisper | int8 | 5 | 59s | 2926MB |
| faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB |
*Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.* ### distil-whisper-large-v3 model on GPU
| Implementation | Precision | Beam size | Time | YT Commons WER |
| --- | --- | --- | --- | --- |
| transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 |
| faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 |
*GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.*
[^1]: transformers OOM for any batch size > 1
### Small model on CPU ### Small model on CPU
| Implementation | Precision | Beam size | Time | Max. memory | | Implementation | Precision | Beam size | Time | RAM Usage |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| openai/whisper | fp32 | 5 | 10m31s | 3101MB | | openai/whisper | fp32 | 5 | 6m58s | 2335MB |
| whisper.cpp | fp32 | 5 | 17m42s | 1581MB | | whisper.cpp | fp32 | 5 | 2m05s | 1049MB |
| whisper.cpp | fp16 | 5 | 12m39s | 873MB | | whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB |
| faster-whisper | fp32 | 5 | 2m44s | 1675MB | | faster-whisper | fp32 | 5 | 2m37s | 2257MB |
| faster-whisper | int8 | 5 | 2m04s | 995MB | | faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB |
| faster-whisper | int8 | 5 | 1m42s | 1477MB |
| faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB |
*Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.* *Executed with 8 threads on an Intel Core i7-12700K.*
### Distil-whisper
| Implementation | Precision | Beam size | Time | Gigaspeech WER |
| --- | --- | --- | --- | --- |
| distil-whisper/distil-large-v2 | fp16 | 4 |- | 10.36 |
| [faster-distil-large-v2](https://huggingface.co/Systran/faster-distil-whisper-large-v2) | fp16 | 5 | - | 10.28 |
| distil-whisper/distil-medium.en | fp16 | 4 | - | 11.21 |
| [faster-distil-medium.en](https://huggingface.co/Systran/faster-distil-whisper-medium.en) | fp16 | 5 | - | 11.21 |
*Executed with CUDA 11.4 on a NVIDIA 3090.*
<details>
<summary>testing details (click to expand)</summary>
For `distil-whisper/distil-large-v2`, the WER is tested with code sample from [link](https://huggingface.co/distil-whisper/distil-large-v2#evaluation). for `faster-distil-whisper`, the WER is tested with setting:
```python
from faster_whisper import WhisperModel
model_size = "distil-large-v2"
# model_size = "distil-medium.en"
# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en")
```
</details>
## Requirements ## Requirements
* Python 3.8 or greater * Python 3.8 or greater
Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.
### GPU ### GPU

View File

@@ -1,3 +1,3 @@
"""Version information.""" """Version information."""
__version__ = "1.1.0rc0" __version__ = "1.1.0"