Support VAD filter (#95)

* Support VAD filter * Generalize function collect_samples * Define AudioSegment class * Only pass prompt and prefix to the first chunk * Add dict argument vad_parameters * Fix isort format * Rename method * Update README * Add shortcut when the chunk offset is 0 * Reword readme * Fix end property * Concatenate the speech chunks * Cleanup diff * Increase default speech pad * Update README * Increase default speech pad
2026-01-09 13:38:01 -05:00 · 2023-04-03 17:22:48 +02:00
parent b4c1c57781
commit 19698c95f8
9 changed files with 370 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -97,6 +97,22 @@ for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
 ```

+#### VAD filter
+
+The library integrates the [Silero VAD](https://github.com/snakers4/silero-vad) model to filter out parts of the audio without speech:
+
+```python
+segments, _ = model.transcribe("audio.mp3", vad_filter=True)
+```
+
+The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the function [`get_speech_timestamps`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:
+
+```python
+segments, _ = model.transcribe("audio.mp3", vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500))
+```
+
+#### Going further
+
 See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.

 ### CLI