263 Commits

Author SHA1 Message Date
Purfview
ed9a06cd89 Adds new VAD parameters (#1386)
* Adds new VAD parameters

Adds new VAD parameters: 

min_silence_at_max_speech: Minimum silence duration in ms which is used to avoid abrupt cuts when max_speech_duration_s is reached.

use_max_poss_sil_at_max_speech: Whether to use the maximum possible silence at max_speech_duration_s or not. If not, the last silence is used.

* Style

* Update doc

* change min_speech_duration_ms (0 -> 250)

* Change min_speech_duration_ms to zero

Set minimum speech duration to zero for flexibility.

---------

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2025-11-19 17:40:46 +03:00
Purfview
2eeafe05de Update Silero-VAD weights to v6.2 (#1390)
* Update Silero-VAD weights to v6.2

Overall slight quality improvement (no metrics update);
Higher stability on OOD / rare / strange / unique data;
Significant quality improvements on various known edge cases:

    Unusual voices
    Child voices
    Cartoon voices
    Muted voices
    Muted speech
    Lower quality phone calls

https://github.com/snakers4/silero-vad/releases/tag/v6.2

* Changes: tiny -> base in test_monotonic_timestamps()
2025-11-19 17:14:42 +03:00
Purfview
cf42429f96 Remove "local_dir_use_symlinks" from download_model() (#1389)
* Remove "local_dir_use_symlinks" from download_model()

It's deprecated since huggingface_hub v0.23.0 and produce this warning:

>   /opt/hostedtoolcache/Python/3.9.24/x64/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `snapshot_download`. Downloading to a local directory does not use symlinks anymore.

* Bump huggingface_hub requirement to v0.23
2025-11-18 21:59:01 +03:00
Mahmoud Ashraf
65882eee9f Bump version to 1.2.1 v1.2.1 2025-10-31 14:31:14 +03:00
Mahmoud Ashraf
409a6919f9 Prevent timestamps restoration when clip timestamps are provided in batched inference (#1376) 2025-10-31 14:26:17 +03:00
Mahmoud Ashraf
00a5b26b1f Offload retry logic to hf hub (#1382)
* remove requirement for requests
2025-10-30 22:11:01 +03:00
Purfview
9090997d25 Fix a typo (#1377) 2025-10-22 15:51:56 +03:00
Mahmoud Ashraf
dea24cbcc6 Upgrade to Silero-VAD V6 (#1373)
Co-authored-by: sssshhhhhh 193317444+sssshhhhhh@users.noreply.github.com
2025-10-14 15:29:56 +03:00
Mario
14ba1051f3 Fix: add <|nocaptions|> to suppressed tokens (#1338)
* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

- Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list
- Add segment filtering to remove any remaining <|nocaptions|> segments
- Resolves issue where BatchedInferencePipeline would generate malformed
  special tokens during periods of silence or low-confidence transcription
- Includes comprehensive tests to verify the fix

The issue occurred because while bracket tokens ('<', '|', '>') were
already suppressed, the content tokens ('no', 'ca', 'ptions') were not,
leading to partial token generation that formed complete <|nocaptions|>
tags in the output.

Files changed:
- faster_whisper/transcribe.py: Core fix implementation
- test_nocaptions_comprehensive.py: Comprehensive test suite
- tests/test_nocaptions_fix.py: Unit tests

* removed

* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

* Fix: Implement proper <|nocaptions|> token suppression using single token approach

* ci: trigger tests

* fix: remove trailing whitespace from blank lines

* Update faster_whisper/transcribe.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Rename no_speech to no_captions in tokenizer

* nocaptions has been renamed to nospeech

* break line

* line break

* Refactor no_speech method for improved readability by adjusting line breaks

---------

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2025-10-10 21:56:54 +03:00
Mahmoud Ashraf
c26d609974 only merge when clip_timestamps are not provided (#1345)
fixes #1340 and allows for batching multiple audio files less than 30s each
2025-08-16 14:30:50 +03:00
黑墨水鱼
4bd98d5c5b Update README.md to include whisper-fastapi (#1325) 2025-08-11 13:44:48 +03:00
Mahmoud Ashraf
93001a9438 bump version to 1.2.0 v1.2.0 2025-08-06 03:31:36 +03:00
Mahmoud Ashraf
a0c3cb9802 Remove Silence in Batched transcription (#1297) 2025-08-06 03:30:59 +03:00
Mahmoud Ashraf
fbeb1ba731 get correct index for samples (#1336) 2025-08-06 03:17:45 +03:00
Rishil
d3bfd0a305 feat: Allow loading of private HF models (#1309)
* feat: add HuggingFace auth token support to model download

* Format
2025-06-02 14:12:34 +03:00
Mahmoud Ashraf
43d4163fe0 Support distil-large-v3.5 (#1311) 2025-06-02 14:09:20 +03:00
Felix Mosheev
700584b2e6 feat: allow passing specific revision to download (#1292) 2025-04-30 00:55:48 +03:00
David Jiménez
1383fd4d37 Update README.md with speaches instead of faster-whisper-server (#1267)
Was previously named faster-whisper-server. They've decided to change the name from faster-whisper-server to speaches, as the project has evolved to support more than just ASR.
2025-03-20 17:20:26 +03:00
Mahmoud Ashraf
9e657b47cb Bump version to 1.1.1 v1.1.1 2025-01-01 17:44:54 +03:00
Purfview
11fd8ab301 Fix neg_threshold (#1191) 2024-12-29 14:38:58 +03:00
Dragoș Bălan
95164297ff Add duration of audio and VAD removed duration to BatchedInferencePipeline (#1186)
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2024-12-23 17:23:40 +02:00
Purfview
1b24f284c9 Reduce VAD memory usage (#1198)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-12-12 15:23:30 +03:00
Jordi Mas
b568faec40 Add Open-dubbing into community projects (#1034)
* Add Open-dubbing into community projects

* Update URL
2024-12-12 13:36:04 +03:00
Purfview
f32c0e8af3 Make batched suppress_tokens behaviour same as in sequential (#1194) 2024-12-11 14:51:38 +03:00
Purfview
8327d8cc64 Brings back original VAD parameters naming (#1181) 2024-12-01 20:41:53 +03:00
Mahmoud Ashraf
22a5238b56 Upgrade CI to 3.9 and drop Python 3.8 support(#1184) 2024-12-01 20:38:27 +03:00
Mahmoud Ashraf
97a4785fa1 Bump version to 1.1.0 and update benchmarks (#1161)
* update version

* Update CPU benchmarks

* Updated GPU benchmarks

* ..

* more gpu benchmarks
v1.1.0
2024-11-21 19:22:01 +03:00
Mahmoud Ashraf
08f6900217 remove log_prob_low_threshold (#1160) 2024-11-21 00:03:21 +03:00
Mahmoud Ashraf
9c8ef76c98 use jiwer instead of evaluate in benchmarks (#1159) 2024-11-20 23:51:55 +03:00
Mahmoud Ashraf
491852e1b9 Add new tests (#1158) 2024-11-20 14:50:57 +03:00
Mahmoud Ashraf
f830c6f241 Fix list index out of range in word timestamps (#1157) 2024-11-20 13:36:58 +03:00
Mahmoud Ashraf
bcd8ce0fc7 refactor multilingual option (#1148)
* Added test for `multilingual` option with english-german audio
* removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"`
* use the correct `encoder_output` for language detection in sequential transcription
* enabled `multilingual` functionality for batched inference
2024-11-20 00:14:59 +03:00
Mahmoud Ashraf
be9fb36ed3 Cleanup of BatchedInferencePipeline (#1135) 2024-11-17 16:45:32 +03:00
Mahmoud Ashraf
a6f8fbae00 Refactor of language detection functions (#1146)
* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
2024-11-16 13:53:07 +03:00
黑墨水鱼
53bbe54016 fix: Use correct seek value in output, fix word timestamps when the initial timestamp is not zero (#1141)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-11-15 14:57:38 +03:00
Mahmoud Ashraf
85e61ea111 Add progress bar to WhisperModel.transcribe (#1138) 2024-11-14 17:12:39 +03:00
Mahmoud Ashraf
3e0ba86571 Remove torch dependency, Faster numpy Feature extraction (#1106) 2024-11-14 12:57:10 +03:00
Mahmoud Ashraf
8f01aee36b Update WhisperModel documentation to list all available models (#1137) 2024-11-13 19:26:01 +03:00
Mahmoud Ashraf
c2bf036234 change language_detection_threshold default value (#1134) 2024-11-13 17:07:46 +03:00
Mahmoud Ashraf
fb65cd387f Update cuda instructions in readme (#1125)
* Update README.md

* Update README.md

* Update version.py

* Update README.md

* Update README.md

* Update README.md
2024-11-12 15:51:26 +03:00
Mahmoud Ashraf
203dddb047 replace NamedTuple with dataclass (#1105)
* replace `NamedTuple` with `dataclass`

* add deprecation warnings
2024-11-05 12:32:20 +03:00
Mahmoud Ashraf
814472fdbf Revert CPU default threads to 0
https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010
2024-10-30 23:00:36 +03:00
Ozan Caglayan
f978fa2979 Revert CPU default threads to 4 (#965)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-10-30 16:50:49 +03:00
Mahmoud Ashraf
2386843fd7 Use correct features padding for encoder input (#1101)
* pad to 3000 instead of `feature_extractor.nb_max_frames`

* correct trimming for batched features
2024-10-29 17:58:05 +03:00
黑墨水鱼
c2a1da1bd9 typo: trubo -> turbo (#1092) 2024-10-26 00:28:16 +03:00
Mahmoud Ashraf
b2da05582c Add support for turbo model (#1090) 2024-10-25 15:50:23 +03:00
Mahmoud Ashraf
2dbca5e559 Use Silero VAD in Batched Mode (#936)
Replace Pyannote VAD with Silero to reduce code duplication and requirements
2024-10-24 12:05:25 +03:00
Mahmoud Ashraf
574e2563e7 Update Dockerfile to ensure compatibility with CT2==4.5.0 2024-10-23 18:28:27 +03:00
Mahmoud Ashraf
42b8681edb revert back to using PyAV instead of torchaudio (#961)
* revert back to using PyAV instead of torch audio

* Update audio.py
2024-10-23 15:26:18 +03:00
Mahmoud Ashraf
d57c5b40b0 Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference (#921)
* fix word timestamps for batched inference

* remove hf pipeline
2024-07-27 09:02:58 +07:00