faster-whisper

mirror of https://github.com/SYSTRAN/faster-whisper.git synced 2026-01-09 13:38:01 -05:00

Author	SHA1	Message	Date
Mahmoud Ashraf	409a6919f9	Prevent timestamps restoration when clip timestamps are provided in batched inference (#1376 )	2025-10-31 14:26:17 +03:00
Purfview	9090997d25	Fix a typo (#1377 )	2025-10-22 15:51:56 +03:00
Mario	14ba1051f3	Fix: add `<\|nocaptions\|>` to suppressed tokens (#1338 ) * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline - Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list - Add segment filtering to remove any remaining <\|nocaptions\|> segments - Resolves issue where BatchedInferencePipeline would generate malformed special tokens during periods of silence or low-confidence transcription - Includes comprehensive tests to verify the fix The issue occurred because while bracket tokens ('<', '\|', '>') were already suppressed, the content tokens ('no', 'ca', 'ptions') were not, leading to partial token generation that formed complete <\|nocaptions\|> tags in the output. Files changed: - faster_whisper/transcribe.py: Core fix implementation - test_nocaptions_comprehensive.py: Comprehensive test suite - tests/test_nocaptions_fix.py: Unit tests * removed * Fix: Prevent <\|nocaptions\|> tokens in BatchedInferencePipeline * Fix: Implement proper <\|nocaptions\|> token suppression using single token approach * ci: trigger tests * fix: remove trailing whitespace from blank lines * Update faster_whisper/transcribe.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Update faster_whisper/tokenizer.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Update faster_whisper/tokenizer.py Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com> * Rename no_speech to no_captions in tokenizer * nocaptions has been renamed to nospeech * break line * line break * Refactor no_speech method for improved readability by adjusting line breaks --------- Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2025-10-10 21:56:54 +03:00
Mahmoud Ashraf	c26d609974	only merge when `clip_timestamps` are not provided (#1345 ) fixes #1340 and allows for batching multiple audio files less than 30s each	2025-08-16 14:30:50 +03:00
Mahmoud Ashraf	a0c3cb9802	Remove Silence in Batched transcription (#1297 )	2025-08-06 03:30:59 +03:00
Mahmoud Ashraf	fbeb1ba731	get correct index for samples (#1336 )	2025-08-06 03:17:45 +03:00
Rishil	d3bfd0a305	feat: Allow loading of private HF models (#1309 ) * feat: add HuggingFace auth token support to model download * Format	2025-06-02 14:12:34 +03:00
Felix Mosheev	700584b2e6	feat: allow passing specific revision to download (#1292 )	2025-04-30 00:55:48 +03:00
Dragoș Bălan	95164297ff	Add duration of audio and VAD removed duration to BatchedInferencePipeline (#1186 ) Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>	2024-12-23 17:23:40 +02:00
Purfview	f32c0e8af3	Make batched suppress_tokens behaviour same as in sequential (#1194 )	2024-12-11 14:51:38 +03:00
Mahmoud Ashraf	08f6900217	remove `log_prob_low_threshold` (#1160 )	2024-11-21 00:03:21 +03:00
Mahmoud Ashraf	f830c6f241	Fix `list index out of range` in word timestamps (#1157 )	2024-11-20 13:36:58 +03:00
Mahmoud Ashraf	bcd8ce0fc7	refactor `multilingual` option (#1148 ) * Added test for `multilingual` option with english-german audio * removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"` * use the correct `encoder_output` for language detection in sequential transcription * enabled `multilingual` functionality for batched inference	2024-11-20 00:14:59 +03:00
Mahmoud Ashraf	be9fb36ed3	Cleanup of `BatchedInferencePipeline` (#1135 )	2024-11-17 16:45:32 +03:00
Mahmoud Ashraf	a6f8fbae00	Refactor of language detection functions (#1146 ) * Supported new options for batched transcriptions: * `language_detection_threshold` * `language_detection_segments` * Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function. * Removed the following functions as they are no longer needed: * `WhisperModel.detect_language_multi_segment` and its test * `BatchedInferencePipeline.get_language_and_tokenizer` * Added tests for empty audios	2024-11-16 13:53:07 +03:00
黑墨水鱼	53bbe54016	fix: Use correct `seek` value in output, fix word timestamps when the initial timestamp is not zero (#1141 ) Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2024-11-15 14:57:38 +03:00
Mahmoud Ashraf	85e61ea111	Add progress bar to `WhisperModel.transcribe` (#1138 )	2024-11-14 17:12:39 +03:00
Mahmoud Ashraf	3e0ba86571	Remove `torch` dependency, Faster numpy Feature extraction (#1106 )	2024-11-14 12:57:10 +03:00
Mahmoud Ashraf	8f01aee36b	Update WhisperModel documentation to list all available models (#1137 )	2024-11-13 19:26:01 +03:00
Mahmoud Ashraf	c2bf036234	change `language_detection_threshold` default value (#1134 )	2024-11-13 17:07:46 +03:00
Mahmoud Ashraf	203dddb047	replace `NamedTuple` with `dataclass` (#1105 ) * replace `NamedTuple` with `dataclass` * add deprecation warnings	2024-11-05 12:32:20 +03:00
Mahmoud Ashraf	814472fdbf	Revert CPU default threads to 0 https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010	2024-10-30 23:00:36 +03:00
Ozan Caglayan	f978fa2979	Revert CPU default threads to 4 (#965 ) Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2024-10-30 16:50:49 +03:00
Mahmoud Ashraf	2386843fd7	Use correct features padding for encoder input (#1101 ) * pad to 3000 instead of `feature_extractor.nb_max_frames` * correct trimming for batched features	2024-10-29 17:58:05 +03:00
Mahmoud Ashraf	2dbca5e559	Use Silero VAD in Batched Mode (#936 ) Replace Pyannote VAD with Silero to reduce code duplication and requirements	2024-10-24 12:05:25 +03:00
Mahmoud Ashraf	d57c5b40b0	Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference (#921 ) * fix word timestamps for batched inference * remove hf pipeline	2024-07-27 09:02:58 +07:00
zh-plus	83a368e98a	Make vad-related parameters configurable for batched inference. (#923 )	2024-07-24 09:00:32 +07:00
Jilt Sebastian	eb8390233c	New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856 ) Batching Support, Speed Boosts, and Quality Enhancements --------- Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>	2024-07-18 16:48:52 +07:00
trungkienbkhn	fbcf58bf98	Fix language detection with non-speech audio (#895 )	2024-07-05 14:43:45 +07:00
Jordi Mas	1195359984	Filter out non_speech_tokens in suppressed tokens (#898 ) * Filter out non_speech_tokens in suppressed tokens	2024-07-05 14:43:11 +07:00
ABen	8862bee1f8	Improve language detection when using clip_timestamps (#867 )	2024-07-01 16:12:45 +07:00
Napuh	f53be1e811	Add distil models to WhisperModel init and download_model docstrings (#847 ) * chore: add distil models to WhisperModel init docstring and download_model docstring	2024-05-20 08:51:22 +07:00
Natanael Tan	4acdb5c619	Fix #839 incorrect clip_timestamps being used in model (#842 ) * Fix #839 Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour	2024-05-17 16:35:07 +07:00
Keating Reid	49a80eb8a8	Clarify documentation for hotwords (#817 ) * Clarify documentation for hotwords * Remove redundant type specifications	2024-05-06 08:52:59 +07:00
trungkienbkhn	8d5e6d56d9	Support initializing more whisper model args (#807 )	2024-05-04 15:12:59 +07:00
jax	847fec4492	Feature/add hotwords (#731 ) * add hotword params --------- Co-authored-by: jax <jax_builder@gamil.com>	2024-05-04 15:11:52 +07:00
Purfview	b024972a56	Foolproof: Disable VAD if clip_timestamps is in use (#769 ) * Foolproof: Disable VAD if clip_timestamps is in use Prevent silly things to happen.	2024-04-02 18:20:34 +02:00
Purfview	8ae82c8372	Bugfix: code breaks if audio is empty (#768 ) * Bugfix: code breaks if audio is empty Regression since https://github.com/SYSTRAN/faster-whisper/pull/732 PR	2024-04-02 18:18:12 +02:00
trungkienbkhn	1eb9a8004c	Improve language detection (#732 )	2024-03-12 15:44:49 +01:00
Purfview	5090cc9d0d	Fix window end heuristic for hallucination_silence_threshold (#706 ) Removes the wishful heuristic causing more issues than it's fixing. Same as https://github.com/openai/whisper/pull/2043 Example of the issue: https://github.com/openai/whisper/pull/1838#issuecomment-1960041500	2024-02-29 17:59:32 +01:00
trungkienbkhn	16141e65d9	Add pad_or_trim function to handle segment before encoding (#705 )	2024-02-29 17:08:28 +01:00
Purfview	30d6043e90	Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697 ) Same as https://github.com/openai/whisper/pull/2005	2024-02-22 09:48:35 +01:00
trungkienbkhn	092067208b	Add clip_timestamps and hallucination_silence_threshold options (#646 )	2024-02-20 17:34:54 +01:00
Purfview	3aec421849	Add: More clarity of what "max_new_tokens" does (#658 ) * Add: More clarity of what "max_new_tokens" does	2024-01-28 21:40:33 +01:00
Purfview	00efce1696	Bugfix: Illogical "Avoid computing higher temperatures on no_speech" (#652 )	2024-01-24 11:54:43 +01:00
metame	ad3c83045b	support distil-whisper (#557 )	2024-01-24 10:17:12 +01:00
Purfview	ebcfd6b964	Fix broken prompt_reset_on_temperature (#604 ) * Fix broken prompt_reset_on_temperature Fixing: https://github.com/SYSTRAN/faster-whisper/issues/603 Broken because `generate_with_fallback()` doesn't return final temperature. Regression since PR356 -> https://github.com/SYSTRAN/faster-whisper/pull/356	2023-12-13 13:14:39 +01:00
trungkienbkhn	19329a3611	Word timing tweaks (#616 )	2023-12-13 12:38:44 +01:00
Oscaarjs	3084409633	Add V3 Support (#578 ) * Add V3 Support * update conversion example --------- Co-authored-by: oscaarjs <oscar.johansson@conversy.se>	2023-11-24 23:16:12 +01:00
Guillaume Klein	e94711bb5c	Add property WhisperModel.supported_languages (#476 ) * Expose function supported_languages * Make it a method	2023-09-14 17:42:02 +02:00

1 2 3

131 Commits