Mahmoud Ashraf
85e61ea111
Add progress bar to WhisperModel.transcribe ( #1138 )
2024-11-14 17:12:39 +03:00
Mahmoud Ashraf
3e0ba86571
Remove torch dependency, Faster numpy Feature extraction ( #1106 )
2024-11-14 12:57:10 +03:00
Mahmoud Ashraf
8f01aee36b
Update WhisperModel documentation to list all available models ( #1137 )
2024-11-13 19:26:01 +03:00
Mahmoud Ashraf
c2bf036234
change language_detection_threshold default value ( #1134 )
2024-11-13 17:07:46 +03:00
Mahmoud Ashraf
fb65cd387f
Update cuda instructions in readme ( #1125 )
...
* Update README.md
* Update README.md
* Update version.py
* Update README.md
* Update README.md
* Update README.md
2024-11-12 15:51:26 +03:00
Mahmoud Ashraf
203dddb047
replace NamedTuple with dataclass ( #1105 )
...
* replace `NamedTuple` with `dataclass`
* add deprecation warnings
2024-11-05 12:32:20 +03:00
Mahmoud Ashraf
814472fdbf
Revert CPU default threads to 0
...
https://github.com/SYSTRAN/faster-whisper/pull/965#issuecomment-2448208010
2024-10-30 23:00:36 +03:00
Ozan Caglayan
f978fa2979
Revert CPU default threads to 4 ( #965 )
...
Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com >
2024-10-30 16:50:49 +03:00
Mahmoud Ashraf
2386843fd7
Use correct features padding for encoder input ( #1101 )
...
* pad to 3000 instead of `feature_extractor.nb_max_frames`
* correct trimming for batched features
2024-10-29 17:58:05 +03:00
黑墨水鱼
c2a1da1bd9
typo: trubo -> turbo ( #1092 )
2024-10-26 00:28:16 +03:00
Mahmoud Ashraf
b2da05582c
Add support for turbo model ( #1090 )
2024-10-25 15:50:23 +03:00
Mahmoud Ashraf
2dbca5e559
Use Silero VAD in Batched Mode ( #936 )
...
Replace Pyannote VAD with Silero to reduce code duplication and requirements
2024-10-24 12:05:25 +03:00
Mahmoud Ashraf
42b8681edb
revert back to using PyAV instead of torchaudio ( #961 )
...
* revert back to using PyAV instead of torch audio
* Update audio.py
2024-10-23 15:26:18 +03:00
Mahmoud Ashraf
d57c5b40b0
Remove the usage of transformers.pipeline from BatchedInferencePipeline and fix word timestamps for batched inference ( #921 )
...
* fix word timestamps for batched inference
* remove hf pipeline
2024-07-27 09:02:58 +07:00
zh-plus
83a368e98a
Make vad-related parameters configurable for batched inference. ( #923 )
2024-07-24 09:00:32 +07:00
Jilt Sebastian
eb8390233c
New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements ( #856 )
...
Batching Support, Speed Boosts, and Quality Enhancements
---------
Co-authored-by: Hargun Mujral <83234565+hargunmujral@users.noreply.github.com >
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com >
2024-07-18 16:48:52 +07:00
trungkienbkhn
fbcf58bf98
Fix language detection with non-speech audio ( #895 )
2024-07-05 14:43:45 +07:00
Jordi Mas
1195359984
Filter out non_speech_tokens in suppressed tokens ( #898 )
...
* Filter out non_speech_tokens in suppressed tokens
2024-07-05 14:43:11 +07:00
trungkienbkhn
c22db5125d
Bump version to 1.0.3 ( #887 )
2024-07-01 16:36:12 +07:00
ABen
8862bee1f8
Improve language detection when using clip_timestamps ( #867 )
2024-07-01 16:12:45 +07:00
Ki Hoon Kim
8d400e9870
Upgrade to Silero-Vad V5 ( #884 )
...
* Fix window_size_samples to 512
* Update SileroVADModel
* Replace ONNX file with V5 version
2024-07-01 15:40:37 +07:00
Napuh
f53be1e811
Add distil models to WhisperModel init and download_model docstrings ( #847 )
...
* chore: add distil models to WhisperModel init docstring and download_model docstring
2024-05-20 08:51:22 +07:00
Natanael Tan
4acdb5c619
Fix #839 incorrect clip_timestamps being used in model ( #842 )
...
* Fix #839
Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour
2024-05-17 16:35:07 +07:00
trungkienbkhn
2f6913efc8
Bump version to 1.0.2 ( #816 )
2024-05-06 09:02:54 +07:00
Keating Reid
49a80eb8a8
Clarify documentation for hotwords ( #817 )
...
* Clarify documentation for hotwords
* Remove redundant type specifications
2024-05-06 08:52:59 +07:00
trungkienbkhn
8d5e6d56d9
Support initializing more whisper model args ( #807 )
2024-05-04 15:12:59 +07:00
jax
847fec4492
Feature/add hotwords ( #731 )
...
* add hotword params
---------
Co-authored-by: jax <jax_builder@gamil.com >
2024-05-04 15:11:52 +07:00
otakutyrant
91c8307aa6
make faster_whisper.assets as a valid python package to distribute ( #772 ) ( #774 )
2024-04-02 18:22:22 +02:00
Purfview
b024972a56
Foolproof: Disable VAD if clip_timestamps is in use ( #769 )
...
* Foolproof: Disable VAD if clip_timestamps is in use
Prevent silly things to happen.
2024-04-02 18:20:34 +02:00
Purfview
8ae82c8372
Bugfix: code breaks if audio is empty ( #768 )
...
* Bugfix: code breaks if audio is empty
Regression since https://github.com/SYSTRAN/faster-whisper/pull/732 PR
2024-04-02 18:18:12 +02:00
trungkienbkhn
e0c3a9ed34
Update project github link to SYSTRAN ( #746 )
2024-03-27 08:31:17 +01:00
Sanchit Gandhi
a67e0e47ae
Add support for distil-large-v3 ( #755 )
...
* add distil-large-v3
* Update README.md
* use fp16 weights from Systran
2024-03-26 14:58:39 +01:00
trungkienbkhn
1eb9a8004c
Improve language detection ( #732 )
2024-03-12 15:44:49 +01:00
trungkienbkhn
a342b028b7
Bump version to 1.0.1 ( #725 )
2024-03-01 11:32:12 +01:00
Purfview
5090cc9d0d
Fix window end heuristic for hallucination_silence_threshold ( #706 )
...
Removes the wishful heuristic causing more issues than it's fixing.
Same as https://github.com/openai/whisper/pull/2043
Example of the issue: https://github.com/openai/whisper/pull/1838#issuecomment-1960041500
2024-02-29 17:59:32 +01:00
trungkienbkhn
16141e65d9
Add pad_or_trim function to handle segment before encoding ( #705 )
2024-02-29 17:08:28 +01:00
trungkienbkhn
06d32bf0c1
Bump version to 1.0.0 ( #696 )
2024-02-22 09:49:01 +01:00
Purfview
30d6043e90
Prevent infinite loop for out-of-bound timestamps in clip_timestamps ( #697 )
...
Same as https://github.com/openai/whisper/pull/2005
2024-02-22 09:48:35 +01:00
trungkienbkhn
092067208b
Add clip_timestamps and hallucination_silence_threshold options ( #646 )
2024-02-20 17:34:54 +01:00
Purfview
3aec421849
Add: More clarity of what "max_new_tokens" does ( #658 )
...
* Add: More clarity of what "max_new_tokens" does
2024-01-28 21:40:33 +01:00
Purfview
00efce1696
Bugfix: Illogical "Avoid computing higher temperatures on no_speech" ( #652 )
2024-01-24 11:54:43 +01:00
metame
ad3c83045b
support distil-whisper ( #557 )
2024-01-24 10:17:12 +01:00
Purfview
ebcfd6b964
Fix broken prompt_reset_on_temperature ( #604 )
...
* Fix broken prompt_reset_on_temperature
Fixing: https://github.com/SYSTRAN/faster-whisper/issues/603
Broken because `generate_with_fallback()` doesn't return final temperature.
Regression since PR356 -> https://github.com/SYSTRAN/faster-whisper/pull/356
2023-12-13 13:14:39 +01:00
trungkienbkhn
19329a3611
Word timing tweaks ( #616 )
2023-12-13 12:38:44 +01:00
Clayton Yochum
9641d5f56a
Force read-mode in av.open ( #566 )
...
The `av.open` functions checks input metadata to determine the mode to open with ("r" or "w"). If an input to `decode_audio` is found to be in write-mode, without this change it can't be read. Forcing read mode solves this.
2023-11-27 10:43:35 +01:00
Dang Chuan Nguyen
e1a218fab1
Bump version to 0.10.0
2023-11-24 23:19:47 +01:00
Oscaarjs
3084409633
Add V3 Support ( #578 )
...
* Add V3 Support
* update conversion example
---------
Co-authored-by: oscaarjs <oscar.johansson@conversy.se >
2023-11-24 23:16:12 +01:00
Guillaume Klein
5a0541ea7d
Bump version to 0.9.0
2023-09-18 16:21:37 +02:00
Guillaume Klein
e94711bb5c
Add property WhisperModel.supported_languages ( #476 )
...
* Expose function supported_languages
* Make it a method
2023-09-14 17:42:02 +02:00
Guillaume Klein
0048844f54
Expose function available_models ( #475 )
...
* Expose function available_models
* Add test case
2023-09-14 17:17:01 +02:00