Use Silero VAD in Batched Mode (#936)

Replace Pyannote VAD with Silero to reduce code duplication and requirements
This commit is contained in:
Mahmoud Ashraf
2024-10-24 12:05:25 +03:00
committed by GitHub
parent 574e2563e7
commit 2dbca5e559
12 changed files with 278 additions and 509 deletions

View File

@@ -178,9 +178,6 @@ language_info = model.detect_language_multi_segment("audio.mp3")
### Batched faster-whisper
The batched version of faster-whisper is inspired by [whisper-x](https://github.com/m-bain/whisperX) licensed under the BSD-2 Clause license and integrates its VAD model to this library. We modify this implementation and also replaced the feature extraction with a faster torch-based implementation. Batched version improves the speed upto 10-12x compared to openAI implementation and 3-4x compared to the sequential faster_whisper version. It works by transcribing semantically meaningful audio chunks as batches leading to faster inference.
The following code snippet illustrates how to run inference with batched version on an example audio file. Please also refer to the test scripts of batched faster whisper.
```python