faster-whisper

mirror of https://github.com/SYSTRAN/faster-whisper.git synced 2026-01-09 21:48:08 -05:00

Author	SHA1	Message	Date
Dragoș Bălan	95164297ff	Add duration of audio and VAD removed duration to BatchedInferencePipeline (#1186 ) Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>	2024-12-23 17:23:40 +02:00
Purfview	1b24f284c9	Reduce VAD memory usage (#1198 ) Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2024-12-12 15:23:30 +03:00
Purfview	f32c0e8af3	Make batched suppress_tokens behaviour same as in sequential (#1194 )	2024-12-11 14:51:38 +03:00
Purfview	8327d8cc64	Brings back original VAD parameters naming (#1181 )	2024-12-01 20:41:53 +03:00
Mahmoud Ashraf	97a4785fa1	Bump version to 1.1.0 and update benchmarks (#1161 ) * update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks	2024-11-21 19:22:01 +03:00
Mahmoud Ashraf	08f6900217	remove `log_prob_low_threshold` (#1160 )	2024-11-21 00:03:21 +03:00
Mahmoud Ashraf	f830c6f241	Fix `list index out of range` in word timestamps (#1157 )	2024-11-20 13:36:58 +03:00
Mahmoud Ashraf	bcd8ce0fc7	refactor `multilingual` option (#1148 ) * Added test for `multilingual` option with english-german audio * removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"` * use the correct `encoder_output` for language detection in sequential transcription * enabled `multilingual` functionality for batched inference	2024-11-20 00:14:59 +03:00
Mahmoud Ashraf	be9fb36ed3	Cleanup of `BatchedInferencePipeline` (#1135 )	2024-11-17 16:45:32 +03:00
Mahmoud Ashraf	a6f8fbae00	Refactor of language detection functions (#1146 ) * Supported new options for batched transcriptions: * `language_detection_threshold` * `language_detection_segments` * Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function. * Removed the following functions as they are no longer needed: * `WhisperModel.detect_language_multi_segment` and its test * `BatchedInferencePipeline.get_language_and_tokenizer` * Added tests for empty audios	2024-11-16 13:53:07 +03:00
黑墨水鱼	53bbe54016	fix: Use correct `seek` value in output, fix word timestamps when the initial timestamp is not zero (#1141 ) Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2024-11-15 14:57:38 +03:00
Mahmoud Ashraf	85e61ea111	Add progress bar to `WhisperModel.transcribe` (#1138 )	2024-11-14 17:12:39 +03:00
Mahmoud Ashraf	3e0ba86571	Remove `torch` dependency, Faster numpy Feature extraction (#1106 )	2024-11-14 12:57:10 +03:00
Mahmoud Ashraf	8f01aee36b	Update WhisperModel documentation to list all available models (#1137 )	2024-11-13 19:26:01 +03:00
Mahmoud Ashraf	c2bf036234	change `language_detection_threshold` default value (#1134 )	2024-11-13 17:07:46 +03:00
Mahmoud Ashraf	fb65cd387f	Update cuda instructions in readme (#1125 ) * Update README.md * Update README.md * Update version.py * Update README.md * Update README.md * Update README.md	2024-11-12 15:51:26 +03:00
Mahmoud Ashraf	203dddb047	replace `NamedTuple` with `dataclass` (#1105 ) * replace `NamedTuple` with `dataclass` * add deprecation warnings	2024-11-05 12:32:20 +03:00
Mahmoud Ashraf	814472fdbf	Revert CPU default threads to 0 https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010	2024-10-30 23:00:36 +03:00
Ozan Caglayan	f978fa2979	Revert CPU default threads to 4 (#965 ) Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>	2024-10-30 16:50:49 +03:00
Mahmoud Ashraf	2386843fd7	Use correct features padding for encoder input (#1101 ) * pad to 3000 instead of `feature_extractor.nb_max_frames` * correct trimming for batched features	2024-10-29 17:58:05 +03:00
黑墨水鱼	c2a1da1bd9	typo: trubo -> turbo (#1092 )	2024-10-26 00:28:16 +03:00
Mahmoud Ashraf	b2da05582c	Add support for `turbo` model (#1090 )	2024-10-25 15:50:23 +03:00
Mahmoud Ashraf	2dbca5e559	Use Silero VAD in Batched Mode (#936 ) Replace Pyannote VAD with Silero to reduce code duplication and requirements	2024-10-24 12:05:25 +03:00
Mahmoud Ashraf	42b8681edb	revert back to using PyAV instead of `torchaudio` (#961 ) * revert back to using PyAV instead of torch audio * Update audio.py	2024-10-23 15:26:18 +03:00
Mahmoud Ashraf	d57c5b40b0	Remove the usage of `transformers.pipeline` from `BatchedInferencePipeline` and fix word timestamps for batched inference (#921 ) * fix word timestamps for batched inference * remove hf pipeline	2024-07-27 09:02:58 +07:00
zh-plus	83a368e98a	Make vad-related parameters configurable for batched inference. (#923 )	2024-07-24 09:00:32 +07:00
Jilt Sebastian	eb8390233c	New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856 ) Batching Support, Speed Boosts, and Quality Enhancements --------- Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>	2024-07-18 16:48:52 +07:00
trungkienbkhn	fbcf58bf98	Fix language detection with non-speech audio (#895 )	2024-07-05 14:43:45 +07:00
Jordi Mas	1195359984	Filter out non_speech_tokens in suppressed tokens (#898 ) * Filter out non_speech_tokens in suppressed tokens	2024-07-05 14:43:11 +07:00
trungkienbkhn	c22db5125d	Bump version to 1.0.3 (#887 )	2024-07-01 16:36:12 +07:00
ABen	8862bee1f8	Improve language detection when using clip_timestamps (#867 )	2024-07-01 16:12:45 +07:00
Ki Hoon Kim	8d400e9870	Upgrade to Silero-Vad V5 (#884 ) * Fix window_size_samples to 512 * Update SileroVADModel * Replace ONNX file with V5 version	2024-07-01 15:40:37 +07:00
Napuh	f53be1e811	Add distil models to WhisperModel init and download_model docstrings (#847 ) * chore: add distil models to WhisperModel init docstring and download_model docstring	2024-05-20 08:51:22 +07:00
Natanael Tan	4acdb5c619	Fix #839 incorrect clip_timestamps being used in model (#842 ) * Fix #839 Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour	2024-05-17 16:35:07 +07:00
trungkienbkhn	2f6913efc8	Bump version to 1.0.2 (#816 )	2024-05-06 09:02:54 +07:00
Keating Reid	49a80eb8a8	Clarify documentation for hotwords (#817 ) * Clarify documentation for hotwords * Remove redundant type specifications	2024-05-06 08:52:59 +07:00
trungkienbkhn	8d5e6d56d9	Support initializing more whisper model args (#807 )	2024-05-04 15:12:59 +07:00
jax	847fec4492	Feature/add hotwords (#731 ) * add hotword params --------- Co-authored-by: jax <jax_builder@gamil.com>	2024-05-04 15:11:52 +07:00
otakutyrant	91c8307aa6	make faster_whisper.assets as a valid python package to distribute (#772 ) (#774 )	2024-04-02 18:22:22 +02:00
Purfview	b024972a56	Foolproof: Disable VAD if clip_timestamps is in use (#769 ) * Foolproof: Disable VAD if clip_timestamps is in use Prevent silly things to happen.	2024-04-02 18:20:34 +02:00
Purfview	8ae82c8372	Bugfix: code breaks if audio is empty (#768 ) * Bugfix: code breaks if audio is empty Regression since https://github.com/SYSTRAN/faster-whisper/pull/732 PR	2024-04-02 18:18:12 +02:00
trungkienbkhn	e0c3a9ed34	Update project github link to SYSTRAN (#746 )	2024-03-27 08:31:17 +01:00
Sanchit Gandhi	a67e0e47ae	Add support for distil-large-v3 (#755 ) * add distil-large-v3 * Update README.md * use fp16 weights from Systran	2024-03-26 14:58:39 +01:00
trungkienbkhn	1eb9a8004c	Improve language detection (#732 )	2024-03-12 15:44:49 +01:00
trungkienbkhn	a342b028b7	Bump version to 1.0.1 (#725 )	2024-03-01 11:32:12 +01:00
Purfview	5090cc9d0d	Fix window end heuristic for hallucination_silence_threshold (#706 ) Removes the wishful heuristic causing more issues than it's fixing. Same as https://github.com/openai/whisper/pull/2043 Example of the issue: https://github.com/openai/whisper/pull/1838#issuecomment-1960041500	2024-02-29 17:59:32 +01:00
trungkienbkhn	16141e65d9	Add pad_or_trim function to handle segment before encoding (#705 )	2024-02-29 17:08:28 +01:00
trungkienbkhn	06d32bf0c1	Bump version to 1.0.0 (#696 )	2024-02-22 09:49:01 +01:00
Purfview	30d6043e90	Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697 ) Same as https://github.com/openai/whisper/pull/2005	2024-02-22 09:48:35 +01:00
trungkienbkhn	092067208b	Add clip_timestamps and hallucination_silence_threshold options (#646 )	2024-02-20 17:34:54 +01:00

1 2 3 4

169 Commits