mirror of
https://github.com/SYSTRAN/faster-whisper.git
synced 2026-01-09 13:38:01 -05:00
Support VAD filter (#95)
* Support VAD filter * Generalize function collect_samples * Define AudioSegment class * Only pass prompt and prefix to the first chunk * Add dict argument vad_parameters * Fix isort format * Rename method * Update README * Add shortcut when the chunk offset is 0 * Reword readme * Fix end property * Concatenate the speech chunks * Cleanup diff * Increase default speech pad * Update README * Increase default speech pad
This commit is contained in:
16
README.md
16
README.md
@@ -97,6 +97,22 @@ for segment in segments:
|
||||
print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
|
||||
```
|
||||
|
||||
#### VAD filter
|
||||
|
||||
The library integrates the [Silero VAD](https://github.com/snakers4/silero-vad) model to filter out parts of the audio without speech:
|
||||
|
||||
```python
|
||||
segments, _ = model.transcribe("audio.mp3", vad_filter=True)
|
||||
```
|
||||
|
||||
The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the function [`get_speech_timestamps`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:
|
||||
|
||||
```python
|
||||
segments, _ = model.transcribe("audio.mp3", vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500))
|
||||
```
|
||||
|
||||
#### Going further
|
||||
|
||||
See more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.
|
||||
|
||||
### CLI
|
||||
|
||||
Reference in New Issue
Block a user