mirror of
https://github.com/openai/whisper.git
synced 2026-01-08 21:07:58 -05:00
large-v3-turbo model (#2361)
This commit is contained in:
20
README.md
20
README.md
@@ -57,17 +57,21 @@ pip install setuptools-rust
|
||||
|
||||
## Available models and languages
|
||||
|
||||
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
|
||||
There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs.
|
||||
Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model.
|
||||
The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.
|
||||
|
||||
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|
||||
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|
||||
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~32x |
|
||||
| base | 74 M | `base.en` | `base` | ~1 GB | ~16x |
|
||||
| small | 244 M | `small.en` | `small` | ~2 GB | ~6x |
|
||||
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~10x |
|
||||
| base | 74 M | `base.en` | `base` | ~1 GB | ~7x |
|
||||
| small | 244 M | `small.en` | `small` | ~2 GB | ~4x |
|
||||
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
|
||||
| large | 1550 M | N/A | `large` | ~10 GB | 1x |
|
||||
| turbo | 809 M | N/A | `turbo` | ~6 GB | ~8x |
|
||||
|
||||
The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
|
||||
Additionally, the `turbo` model is an optimized version of `large-v3` that offers faster transcription speed with a minimal degradation in accuracy.
|
||||
|
||||
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.
|
||||
|
||||
@@ -77,9 +81,9 @@ Whisper's performance varies widely depending on the language. The figure below
|
||||
|
||||
## Command-line usage
|
||||
|
||||
The following command will transcribe speech in audio files, using the `medium` model:
|
||||
The following command will transcribe speech in audio files, using the `turbo` model:
|
||||
|
||||
whisper audio.flac audio.mp3 audio.wav --model medium
|
||||
whisper audio.flac audio.mp3 audio.wav --model turbo
|
||||
|
||||
The default setting (which selects the `small` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
|
||||
|
||||
@@ -103,7 +107,7 @@ Transcription can also be performed within Python:
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("base")
|
||||
model = whisper.load_model("turbo")
|
||||
result = model.transcribe("audio.mp3")
|
||||
print(result["text"])
|
||||
```
|
||||
@@ -115,7 +119,7 @@ Below is an example usage of `whisper.detect_language()` and `whisper.decode()`
|
||||
```python
|
||||
import whisper
|
||||
|
||||
model = whisper.load_model("base")
|
||||
model = whisper.load_model("turbo")
|
||||
|
||||
# load audio and pad/trim it to fit 30 seconds
|
||||
audio = whisper.load_audio("audio.mp3")
|
||||
|
||||
Reference in New Issue
Block a user