Commit Graph

169 Commits

Author SHA1 Message Date
Dragoș Bălan
95164297ff Add duration of audio and VAD removed duration to BatchedInferencePipeline (#1186)
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2024-12-23 17:23:40 +02:00
Purfview
1b24f284c9 Reduce VAD memory usage (#1198)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-12-12 15:23:30 +03:00
Purfview
f32c0e8af3 Make batched suppress_tokens behaviour same as in sequential (#1194) 2024-12-11 14:51:38 +03:00
Purfview
8327d8cc64 Brings back original VAD parameters naming (#1181) 2024-12-01 20:41:53 +03:00
Mahmoud Ashraf
97a4785fa1 Bump version to 1.1.0 and update benchmarks (#1161)
* update version

* Update CPU benchmarks

* Updated GPU benchmarks

* ..

* more gpu benchmarks
2024-11-21 19:22:01 +03:00
Mahmoud Ashraf
08f6900217 remove log_prob_low_threshold (#1160) 2024-11-21 00:03:21 +03:00
Mahmoud Ashraf
f830c6f241 Fix list index out of range in word timestamps (#1157) 2024-11-20 13:36:58 +03:00
Mahmoud Ashraf
bcd8ce0fc7 refactor multilingual option (#1148)
* Added test for `multilingual` option with english-german audio
* removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"`
* use the correct `encoder_output` for language detection in sequential transcription
* enabled `multilingual` functionality for batched inference
2024-11-20 00:14:59 +03:00
Mahmoud Ashraf
be9fb36ed3 Cleanup of BatchedInferencePipeline (#1135) 2024-11-17 16:45:32 +03:00
Mahmoud Ashraf
a6f8fbae00 Refactor of language detection functions (#1146)
* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
2024-11-16 13:53:07 +03:00
黑墨水鱼
53bbe54016 fix: Use correct seek value in output, fix word timestamps when the initial timestamp is not zero (#1141)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-11-15 14:57:38 +03:00
Mahmoud Ashraf
85e61ea111 Add progress bar to WhisperModel.transcribe (#1138) 2024-11-14 17:12:39 +03:00
Mahmoud Ashraf
3e0ba86571 Remove torch dependency, Faster numpy Feature extraction (#1106) 2024-11-14 12:57:10 +03:00
Mahmoud Ashraf
8f01aee36b Update WhisperModel documentation to list all available models (#1137) 2024-11-13 19:26:01 +03:00
Mahmoud Ashraf
c2bf036234 change language_detection_threshold default value (#1134) 2024-11-13 17:07:46 +03:00
Mahmoud Ashraf
fb65cd387f Update cuda instructions in readme (#1125)
* Update README.md

* Update README.md

* Update version.py

* Update README.md

* Update README.md

* Update README.md
2024-11-12 15:51:26 +03:00
Mahmoud Ashraf
203dddb047 replace NamedTuple with dataclass (#1105)
* replace `NamedTuple` with `dataclass`

* add deprecation warnings
2024-11-05 12:32:20 +03:00
Mahmoud Ashraf
814472fdbf Revert CPU default threads to 0
https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010
2024-10-30 23:00:36 +03:00
Ozan Caglayan
f978fa2979 Revert CPU default threads to 4 (#965)
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>
2024-10-30 16:50:49 +03:00
Mahmoud Ashraf
2386843fd7 Use correct features padding for encoder input (#1101)
* pad to 3000 instead of `feature_extractor.nb_max_frames`

* correct trimming for batched features
2024-10-29 17:58:05 +03:00
黑墨水鱼
c2a1da1bd9 typo: trubo -> turbo (#1092) 2024-10-26 00:28:16 +03:00
Mahmoud Ashraf
b2da05582c Add support for turbo model (#1090) 2024-10-25 15:50:23 +03:00
Mahmoud Ashraf
2dbca5e559 Use Silero VAD in Batched Mode (#936)
Replace Pyannote VAD with Silero to reduce code duplication and requirements
2024-10-24 12:05:25 +03:00
Mahmoud Ashraf
42b8681edb revert back to using PyAV instead of torchaudio (#961)
* revert back to using PyAV instead of torch audio

* Update audio.py
2024-10-23 15:26:18 +03:00
Mahmoud Ashraf
d57c5b40b0 Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference (#921)
* fix word timestamps for batched inference

* remove hf pipeline
2024-07-27 09:02:58 +07:00
zh-plus
83a368e98a Make vad-related parameters configurable for batched inference. (#923) 2024-07-24 09:00:32 +07:00
Jilt Sebastian
eb8390233c New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856)
Batching Support, Speed Boosts, and Quality Enhancements

---------

Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2024-07-18 16:48:52 +07:00
trungkienbkhn
fbcf58bf98 Fix language detection with non-speech audio (#895) 2024-07-05 14:43:45 +07:00
Jordi Mas
1195359984 Filter out non_speech_tokens in suppressed tokens (#898)
* Filter out non_speech_tokens in suppressed tokens
2024-07-05 14:43:11 +07:00
trungkienbkhn
c22db5125d Bump version to 1.0.3 (#887) 2024-07-01 16:36:12 +07:00
ABen
8862bee1f8 Improve language detection when using clip_timestamps (#867) 2024-07-01 16:12:45 +07:00
Ki Hoon Kim
8d400e9870 Upgrade to Silero-Vad V5 (#884)
* Fix window_size_samples to 512

* Update SileroVADModel

* Replace ONNX file with V5 version
2024-07-01 15:40:37 +07:00
Napuh
f53be1e811 Add distil models to WhisperModel init and download_model docstrings (#847)
* chore: add distil models to WhisperModel init docstring and download_model docstring
2024-05-20 08:51:22 +07:00
Natanael Tan
4acdb5c619 Fix #839 incorrect clip_timestamps being used in model (#842)
* Fix #839

Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour
2024-05-17 16:35:07 +07:00
trungkienbkhn
2f6913efc8 Bump version to 1.0.2 (#816) 2024-05-06 09:02:54 +07:00
Keating Reid
49a80eb8a8 Clarify documentation for hotwords (#817)
* Clarify documentation for hotwords

* Remove redundant type specifications
2024-05-06 08:52:59 +07:00
trungkienbkhn
8d5e6d56d9 Support initializing more whisper model args (#807) 2024-05-04 15:12:59 +07:00
jax
847fec4492 Feature/add hotwords (#731)
* add hotword params

---------

Co-authored-by: jax <jax_builder@gamil.com>
2024-05-04 15:11:52 +07:00
otakutyrant
91c8307aa6 make faster_whisper.assets as a valid python package to distribute (#772) (#774) 2024-04-02 18:22:22 +02:00
Purfview
b024972a56 Foolproof: Disable VAD if clip_timestamps is in use (#769)
* Foolproof: Disable VAD if clip_timestamps is in use

Prevent silly things to happen.
2024-04-02 18:20:34 +02:00
Purfview
8ae82c8372 Bugfix: code breaks if audio is empty (#768)
* Bugfix: code breaks if audio is empty

Regression since https://github.com/SYSTRAN/faster-whisper/pull/732 PR
2024-04-02 18:18:12 +02:00
trungkienbkhn
e0c3a9ed34 Update project github link to SYSTRAN (#746) 2024-03-27 08:31:17 +01:00
Sanchit Gandhi
a67e0e47ae Add support for distil-large-v3 (#755)
* add distil-large-v3

* Update README.md

* use fp16 weights from Systran
2024-03-26 14:58:39 +01:00
trungkienbkhn
1eb9a8004c Improve language detection (#732) 2024-03-12 15:44:49 +01:00
trungkienbkhn
a342b028b7 Bump version to 1.0.1 (#725) 2024-03-01 11:32:12 +01:00
Purfview
5090cc9d0d Fix window end heuristic for hallucination_silence_threshold (#706)
Removes the wishful heuristic causing more issues than it's fixing.

Same as https://github.com/openai/whisper/pull/2043

Example of the issue: https://github.com/openai/whisper/pull/1838#issuecomment-1960041500
2024-02-29 17:59:32 +01:00
trungkienbkhn
16141e65d9 Add pad_or_trim function to handle segment before encoding (#705) 2024-02-29 17:08:28 +01:00
trungkienbkhn
06d32bf0c1 Bump version to 1.0.0 (#696) 2024-02-22 09:49:01 +01:00
Purfview
30d6043e90 Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697)
Same as https://github.com/openai/whisper/pull/2005
2024-02-22 09:48:35 +01:00
trungkienbkhn
092067208b Add clip_timestamps and hallucination_silence_threshold options (#646) 2024-02-20 17:34:54 +01:00