* Adds new VAD parameters
Adds new VAD parameters:
min_silence_at_max_speech: Minimum silence duration in ms which is used to avoid abrupt cuts when max_speech_duration_s is reached.
use_max_poss_sil_at_max_speech: Whether to use the maximum possible silence at max_speech_duration_s or not. If not, the last silence is used.
* Style
* Update doc
* change min_speech_duration_ms (0 -> 250)
* Change min_speech_duration_ms to zero
Set minimum speech duration to zero for flexibility.
---------
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
* Remove "local_dir_use_symlinks" from download_model()
It's deprecated since huggingface_hub v0.23.0 and produce this warning:
> /opt/hostedtoolcache/Python/3.9.24/x64/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `snapshot_download`. Downloading to a local directory does not use symlinks anymore.
* Bump huggingface_hub requirement to v0.23
* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline
- Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list
- Add segment filtering to remove any remaining <|nocaptions|> segments
- Resolves issue where BatchedInferencePipeline would generate malformed
special tokens during periods of silence or low-confidence transcription
- Includes comprehensive tests to verify the fix
The issue occurred because while bracket tokens ('<', '|', '>') were
already suppressed, the content tokens ('no', 'ca', 'ptions') were not,
leading to partial token generation that formed complete <|nocaptions|>
tags in the output.
Files changed:
- faster_whisper/transcribe.py: Core fix implementation
- test_nocaptions_comprehensive.py: Comprehensive test suite
- tests/test_nocaptions_fix.py: Unit tests
* removed
* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline
* Fix: Implement proper <|nocaptions|> token suppression using single token approach
* ci: trigger tests
* fix: remove trailing whitespace from blank lines
* Update faster_whisper/transcribe.py
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
* Update faster_whisper/tokenizer.py
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
* Update faster_whisper/tokenizer.py
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
* Rename no_speech to no_captions in tokenizer
* nocaptions has been renamed to nospeech
* break line
* line break
* Refactor no_speech method for improved readability by adjusting line breaks
---------
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
Was previously named faster-whisper-server. They've decided to change the name from faster-whisper-server to speaches, as the project has evolved to support more than just ASR.
* Added test for `multilingual` option with english-german audio
* removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"`
* use the correct `encoder_output` for language detection in sequential transcription
* enabled `multilingual` functionality for batched inference
* Supported new options for batched transcriptions:
* `language_detection_threshold`
* `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
* `WhisperModel.detect_language_multi_segment` and its test
* `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios