Refactor of language detection functions (#1146)

* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
This commit is contained in:
Mahmoud Ashraf
2024-11-16 12:53:07 +02:00
committed by GitHub
parent 53bbe54016
commit a6f8fbae00
3 changed files with 153 additions and 345 deletions

View File

@@ -164,17 +164,6 @@ segments, _ = model.transcribe("audio.mp3")
segments = list(segments) # The transcription will actually run here.
```
### Multi-Segment Language Detection
To directly use the model for improved language detection, the following code snippet can be used:
```python
from faster_whisper import WhisperModel
model = WhisperModel("turbo", device="cuda", compute_type="float16")
language_info = model.detect_language_multi_segment("audio.mp3")
```
### Batched Transcription
The following code snippet illustrates how to run batched transcription on an example audio file. `BatchedInferencePipeline.transcribe` is a drop-in replacement for `WhisperModel.transcribe`