Support VAD filter (#95)

* Support VAD filter

* Generalize function collect_samples

* Define AudioSegment class

* Only pass prompt and prefix to the first chunk

* Add dict argument vad_parameters

* Fix isort format

* Rename method

* Update README

* Add shortcut when the chunk offset is 0

* Reword readme

* Fix end property

* Concatenate the speech chunks

* Cleanup diff

* Increase default speech pad

* Update README

* Increase default speech pad
This commit is contained in:
Guillaume Klein
2023-04-03 17:22:48 +02:00
committed by GitHub
parent b4c1c57781
commit 19698c95f8
9 changed files with 370 additions and 0 deletions

View File

@@ -97,6 +97,22 @@ for segment in segments:
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
```
#### VAD filter
The library integrates the [Silero VAD](https://github.com/snakers4/silero-vad) model to filter out parts of the audio without speech:
```python
segments, _ = model.transcribe("audio.mp3", vad_filter=True)
```
The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the function [`get_speech_timestamps`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:
```python
segments, _ = model.transcribe("audio.mp3", vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500))
```
#### Going further
See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.
### CLI