261 Commits

Author SHA1 Message Date
Mahmoud Ashraf
ba812f55a2 Fix quotes for Python version in CI workflow 2025-10-30 21:14:30 +03:00
Mahmoud Ashraf
44466c7535 Upgrade Python version from 3.9 to 3.10 in CI 2025-10-30 21:12:36 +03:00
Mahmoud Ashraf
e3e46675b2 Update Python version requirements to 3.10 and 3.12 2025-10-30 21:11:50 +03:00
Mahmoud Ashraf
14ad587c98 Update Python version requirement to 3.10 or greater 2025-10-30 21:11:07 +03:00
Purfview
9090997d25 Fix a typo (#1377) 2025-10-22 15:51:56 +03:00
Mahmoud Ashraf
dea24cbcc6 Upgrade to Silero-VAD V6 (#1373)
Co-authored-by: sssshhhhhh 193317444+sssshhhhhh@users.noreply.github.com
2025-10-14 15:29:56 +03:00
Mario
14ba1051f3 Fix: add <|nocaptions|> to suppressed tokens (#1338)
* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

- Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list
- Add segment filtering to remove any remaining <|nocaptions|> segments
- Resolves issue where BatchedInferencePipeline would generate malformed
  special tokens during periods of silence or low-confidence transcription
- Includes comprehensive tests to verify the fix

The issue occurred because while bracket tokens ('<', '|', '>') were
already suppressed, the content tokens ('no', 'ca', 'ptions') were not,
leading to partial token generation that formed complete <|nocaptions|>
tags in the output.

Files changed:
- faster_whisper/transcribe.py: Core fix implementation
- test_nocaptions_comprehensive.py: Comprehensive test suite
- tests/test_nocaptions_fix.py: Unit tests

* removed

* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

* Fix: Implement proper <|nocaptions|> token suppression using single token approach

* ci: trigger tests

* fix: remove trailing whitespace from blank lines

* Update faster_whisper/transcribe.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Rename no_speech to no_captions in tokenizer

* nocaptions has been renamed to nospeech

* break line

* line break

* Refactor no_speech method for improved readability by adjusting line breaks

---------

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2025-10-10 21:56:54 +03:00
Mahmoud Ashraf
c26d609974 only merge when clip_timestamps are not provided (#1345)
fixes #1340 and allows for batching multiple audio files less than 30s each
2025-08-16 14:30:50 +03:00
黑墨水鱼
4bd98d5c5b Update README.md to include whisper-fastapi (#1325) 2025-08-11 13:44:48 +03:00
Mahmoud Ashraf
93001a9438 bump version to 1.2.0 v1.2.0 2025-08-06 03:31:36 +03:00
Mahmoud Ashraf
a0c3cb9802 Remove Silence in Batched transcription (#1297) 2025-08-06 03:30:59 +03:00
Mahmoud Ashraf
fbeb1ba731 get correct index for samples (#1336) 2025-08-06 03:17:45 +03:00
Rishil
d3bfd0a305 feat: Allow loading of private HF models (#1309)
* feat: add HuggingFace auth token support to model download

* Format
2025-06-02 14:12:34 +03:00
Mahmoud Ashraf
43d4163fe0 Support distil-large-v3.5 (#1311) 2025-06-02 14:09:20 +03:00
Felix Mosheev
700584b2e6 feat: allow passing specific revision to download (#1292) 2025-04-30 00:55:48 +03:00
David Jiménez
1383fd4d37 Update README.md with speaches instead of faster-whisper-server (#1267)
Was previously named faster-whisper-server. They've decided to change the name from faster-whisper-server to speaches, as the project has evolved to support more than just ASR.
2025-03-20 17:20:26 +03:00
Mahmoud Ashraf
9e657b47cb Bump version to 1.1.1 v1.1.1 2025-01-01 17:44:54 +03:00
Purfview
11fd8ab301 Fix neg_threshold (#1191) 2024-12-29 14:38:58 +03:00
Dragoș Bălan
95164297ff Add duration of audio and VAD removed duration to BatchedInferencePipeline (#1186)
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2024-12-23 17:23:40 +02:00
Purfview
1b24f284c9 Reduce VAD memory usage (#1198)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-12-12 15:23:30 +03:00
Jordi Mas
b568faec40 Add Open-dubbing into community projects (#1034)
* Add Open-dubbing into community projects

* Update URL
2024-12-12 13:36:04 +03:00
Purfview
f32c0e8af3 Make batched suppress_tokens behaviour same as in sequential (#1194) 2024-12-11 14:51:38 +03:00
Purfview
8327d8cc64 Brings back original VAD parameters naming (#1181) 2024-12-01 20:41:53 +03:00
Mahmoud Ashraf
22a5238b56 Upgrade CI to 3.9 and drop Python 3.8 support(#1184) 2024-12-01 20:38:27 +03:00
Mahmoud Ashraf
97a4785fa1 Bump version to 1.1.0 and update benchmarks (#1161)
* update version

* Update CPU benchmarks

* Updated GPU benchmarks

* ..

* more gpu benchmarks
v1.1.0
2024-11-21 19:22:01 +03:00
Mahmoud Ashraf
08f6900217 remove log_prob_low_threshold (#1160) 2024-11-21 00:03:21 +03:00
Mahmoud Ashraf
9c8ef76c98 use jiwer instead of evaluate in benchmarks (#1159) 2024-11-20 23:51:55 +03:00
Mahmoud Ashraf
491852e1b9 Add new tests (#1158) 2024-11-20 14:50:57 +03:00
Mahmoud Ashraf
f830c6f241 Fix list index out of range in word timestamps (#1157) 2024-11-20 13:36:58 +03:00
Mahmoud Ashraf
bcd8ce0fc7 refactor multilingual option (#1148)
* Added test for `multilingual` option with english-german audio
* removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"`
* use the correct `encoder_output` for language detection in sequential transcription
* enabled `multilingual` functionality for batched inference
2024-11-20 00:14:59 +03:00
Mahmoud Ashraf
be9fb36ed3 Cleanup of BatchedInferencePipeline (#1135) 2024-11-17 16:45:32 +03:00
Mahmoud Ashraf
a6f8fbae00 Refactor of language detection functions (#1146)
* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
2024-11-16 13:53:07 +03:00
黑墨水鱼
53bbe54016 fix: Use correct seek value in output, fix word timestamps when the initial timestamp is not zero (#1141)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-11-15 14:57:38 +03:00
Mahmoud Ashraf
85e61ea111 Add progress bar to WhisperModel.transcribe (#1138) 2024-11-14 17:12:39 +03:00
Mahmoud Ashraf
3e0ba86571 Remove torch dependency, Faster numpy Feature extraction (#1106) 2024-11-14 12:57:10 +03:00
Mahmoud Ashraf
8f01aee36b Update WhisperModel documentation to list all available models (#1137) 2024-11-13 19:26:01 +03:00
Mahmoud Ashraf
c2bf036234 change language_detection_threshold default value (#1134) 2024-11-13 17:07:46 +03:00
Mahmoud Ashraf
fb65cd387f Update cuda instructions in readme (#1125)
* Update README.md

* Update README.md

* Update version.py

* Update README.md

* Update README.md

* Update README.md
2024-11-12 15:51:26 +03:00
Mahmoud Ashraf
203dddb047 replace NamedTuple with dataclass (#1105)
* replace `NamedTuple` with `dataclass`

* add deprecation warnings
2024-11-05 12:32:20 +03:00
Mahmoud Ashraf
814472fdbf Revert CPU default threads to 0
https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010
2024-10-30 23:00:36 +03:00
Ozan Caglayan
f978fa2979 Revert CPU default threads to 4 (#965)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-10-30 16:50:49 +03:00
Mahmoud Ashraf
2386843fd7 Use correct features padding for encoder input (#1101)
* pad to 3000 instead of `feature_extractor.nb_max_frames`

* correct trimming for batched features
2024-10-29 17:58:05 +03:00
黑墨水鱼
c2a1da1bd9 typo: trubo -> turbo (#1092) 2024-10-26 00:28:16 +03:00
Mahmoud Ashraf
b2da05582c Add support for turbo model (#1090) 2024-10-25 15:50:23 +03:00
Mahmoud Ashraf
2dbca5e559 Use Silero VAD in Batched Mode (#936)
Replace Pyannote VAD with Silero to reduce code duplication and requirements
2024-10-24 12:05:25 +03:00
Mahmoud Ashraf
574e2563e7 Update Dockerfile to ensure compatibility with CT2==4.5.0 2024-10-23 18:28:27 +03:00
Mahmoud Ashraf
42b8681edb revert back to using PyAV instead of torchaudio (#961)
* revert back to using PyAV instead of torch audio

* Update audio.py
2024-10-23 15:26:18 +03:00
Mahmoud Ashraf
d57c5b40b0 Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference (#921)
* fix word timestamps for batched inference

* remove hf pipeline
2024-07-27 09:02:58 +07:00
zh-plus
83a368e98a Make vad-related parameters configurable for batched inference. (#923) 2024-07-24 09:00:32 +07:00
Jilt Sebastian
eb8390233c New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856)
Batching Support, Speed Boosts, and Quality Enhancements

---------

Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2024-07-18 16:48:52 +07:00