faster-whisper

mirror of https://github.com/SYSTRAN/faster-whisper.git synced 2026-01-08 13:14:00 -05:00

Author	SHA1	Message	Date
Purfview	2eeafe05de	Update Silero-VAD weights to v6.2 (#1390 ) * Update Silero-VAD weights to v6.2 Overall slight quality improvement (no metrics update); Higher stability on OOD / rare / strange / unique data; Significant quality improvements on various known edge cases: Unusual voices Child voices Cartoon voices Muted voices Muted speech Lower quality phone calls https://github.com/snakers4/silero-vad/releases/tag/v6.2 * Changes: tiny -> base in test_monotonic_timestamps()	2025-11-19 17:14:42 +03:00
Mahmoud Ashraf	409a6919f9	Prevent timestamps restoration when clip timestamps are provided in batched inference (#1376 )	2025-10-31 14:26:17 +03:00
Mario	14ba1051f3	Fix: add `<\|nocaptions\|>` to suppressed tokens (#1338 ) * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline - Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list - Add segment filtering to remove any remaining <\|nocaptions\|> segments - Resolves issue where BatchedInferencePipeline would generate malformed special tokens during periods of silence or low-confidence transcription - Includes comprehensive tests to verify the fix The issue occurred because while bracket tokens ('<', '\|', '>') were already suppressed, the content tokens ('no', 'ca', 'ptions') were not, leading to partial token generation that formed complete <\|nocaptions\|> tags in the output. Files changed: - faster_whisper/transcribe.py: Core fix implementation - test_nocaptions_comprehensive.py: Comprehensive test suite - tests/test_nocaptions_fix.py: Unit tests * removed * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline * Fix: Implement proper <\|nocaptions\|> token suppression using single token approach * ci: trigger tests * fix: remove trailing whitespace from blank lines * Update faster_whisper/transcribe.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Update faster_whisper/tokenizer.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Update faster_whisper/tokenizer.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Rename no_speech to no_captions in tokenizer * nocaptions has been renamed to nospeech * break line * line break * Refactor no_speech method for improved readability by adjusting line breaks --------- Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2025-10-10 21:56:54 +03:00
Mahmoud Ashraf	c26d609974	only merge when `clip_timestamps` are not provided (#1345 ) fixes #1340 and allows for batching multiple audio files less than 30s each	2025-08-16 14:30:50 +03:00
Mahmoud Ashraf	a0c3cb9802	Remove Silence in Batched transcription (#1297 )	2025-08-06 03:30:59 +03:00
Mahmoud Ashraf	491852e1b9	Add new tests (#1158 )	2024-11-20 14:50:57 +03:00
Mahmoud Ashraf	bcd8ce0fc7	refactor `multilingual` option (#1148 ) * Added test for `multilingual` option with english-german audio * removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"` * use the correct `encoder_output` for language detection in sequential transcription * enabled `multilingual` functionality for batched inference	2024-11-20 00:14:59 +03:00
Mahmoud Ashraf	a6f8fbae00	Refactor of language detection functions (#1146 ) * Supported new options for batched transcriptions: * `language_detection_threshold` * `language_detection_segments` * Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function. * Removed the following functions as they are no longer needed: * `WhisperModel.detect_language_multi_segment` and its test * `BatchedInferencePipeline.get_language_and_tokenizer` * Added tests for empty audios	2024-11-16 13:53:07 +03:00
Mahmoud Ashraf	3e0ba86571	Remove `torch` dependency, Faster numpy Feature extraction (#1106 )	2024-11-14 12:57:10 +03:00
Mahmoud Ashraf	2dbca5e559	Use Silero VAD in Batched Mode (#936 ) Replace Pyannote VAD with Silero to reduce code duplication and requirements	2024-10-24 12:05:25 +03:00
Jilt Sebastian	eb8390233c	New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856 ) Batching Support, Speed Boosts, and Quality Enhancements --------- Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>	2024-07-18 16:48:52 +07:00
Jordi Mas	1195359984	Filter out non_speech_tokens in suppressed tokens (#898 ) * Filter out non_speech_tokens in suppressed tokens	2024-07-05 14:43:11 +07:00
Guillaume Klein	e94711bb5c	Add property WhisperModel.supported_languages (#476 ) * Expose function supported_languages * Make it a method	2023-09-14 17:42:02 +02:00
Guillaume Klein	0048844f54	Expose function available_models (#475 ) * Expose function available_models * Add test case	2023-09-14 17:17:01 +02:00
Guillaume Klein	0e051a5b77	Prepend prefix tokens with the initial timestamp token (#358 )	2023-07-18 15:22:39 +02:00
Ozan Caglayan	91f948b0d6	transcribe: return all language probabilities if requested (#210 ) * transcribe: return all language probabilities if requested If return_all_language_probs is True, TranscriptionInfo structure will have a list of tuples reflecting all language probabilities as returned by the model. * transcribe: fix docstring * transcribe: remove return_all_lang_probs parameter	2023-05-09 14:53:47 +02:00
FlippFuzz	5d8f3e2d90	Implement VadOptions (#198 ) * Implement VadOptions * Fix line too long ./faster_whisper/transcribe.py:226:101: E501 line too long (111 > 100 characters) * Reformatted files with black * black .\faster_whisper\vad.py * black .\faster_whisper\transcribe.py * Fix import order with isort * isort .\faster_whisper\vad.py * isort .\faster_whisper\transcribe.py * Made recommended changes Recommended in https://github.com/guillaumekln/faster-whisper/pull/198 * Fix typing of vad_options argument --------- Co-authored-by: Guillaume Klein <guillaumekln@users.noreply.github.com>	2023-05-09 12:47:02 +02:00
Jordi Mas	68df3214ba	Use cache_dir instead of local_dir (#182 ) * Use cache_dir instead of local_dir * Fix unit test * Use cache_dir and preserve local_dir parameter * Remove blank line at the end * Disable ut * Implement download_root suggestion * Use cache_dir=download_root	2023-04-26 16:35:18 +02:00
Guillaume Klein	8cf5d5a4b3	Increase the default value of speech_pad_ms to 400 ms (#179 )	2023-04-25 15:54:22 +02:00
Guillaume Klein	19698c95f8	Support VAD filter (#95 ) * Support VAD filter * Generalize function collect_samples * Define AudioSegment class * Only pass prompt and prefix to the first chunk * Add dict argument vad_parameters * Fix isort format * Rename method * Update README * Add shortcut when the chunk offset is 0 * Reword readme * Fix end property * Concatenate the speech chunks * Cleanup diff * Increase default speech pad * Update README * Increase default speech pad	2023-04-03 17:22:48 +02:00
Guillaume Klein	f20bb258de	Support separating the left and right audio channels (#97 )	2023-04-03 11:22:43 +02:00
Guillaume Klein	de7682a2f0	Automatically download converted models from the Hugging Face Hub (#70 ) * Automatically download converted models from the Hugging Face Hub * Remove unused import * Remove non needed requirements in dev mode * Remove extra index URL when pip install in CI * Allow downloading to a specific directory * Update docstring * Add argument to disable the progess bars * Fix typo in docstring	2023-03-24 10:55:55 +01:00
Guillaume Klein	66efd02bd0	Run some automatic tests with GitHub Actions (#68 )	2023-03-22 20:50:03 +01:00

23 Commits