Use Silero VAD in Batched Mode (#936)

Replace Pyannote VAD with Silero to reduce code duplication and requirements
2026-01-09 13:38:01 -05:00 · 2024-10-24 12:05:25 +03:00
parent 574e2563e7
commit 2dbca5e559
12 changed files with 278 additions and 509 deletions
--- a/README.md
+++ b/README.md
@@ -178,9 +178,6 @@ language_info = model.detect_language_multi_segment("audio.mp3")

 ### Batched faster-whisper

-
-The batched version of faster-whisper is inspired by [whisper-x](https://github.com/m-bain/whisperX) licensed under the BSD-2 Clause license and integrates its VAD model to this library. We modify this implementation and also replaced the feature extraction with a faster torch-based implementation. Batched version improves the speed upto 10-12x compared to openAI implementation and 3-4x compared to the sequential faster_whisper version. It works by transcribing semantically meaningful audio chunks as batches leading to faster inference. 
-
 The following code snippet illustrates how to run inference with batched version on an example audio file. Please also refer to the test scripts of batched faster whisper.

 ```python