Commit Graph

231 Commits

Author SHA1 Message Date
Tabby Cromarty
510ecbc236 Fix build with CMake 4 2025-10-18 19:29:05 +01:00
Tabby Cromarty
4b6b2d4fb0 Update README for non-ubuntu linux builds
Add system ICU parameter and note about CMake 4
2025-10-18 12:16:34 +01:00
Tabitha Cromarty
529d2d0314 Add option to use system ICU libs (#220)
If ICU is already available on the system there's no point downloading and
building it again. To save time and space, add an option to find and use the
system version of ICU
2025-10-18 10:29:54 +01:00
Tabby Cromarty
a1b3e248a2 Skip notarising MacOS package on release
Apple developer ID has expired and those are spenny
0.4.0
2025-10-18 09:36:55 +01:00
Tabitha Cromarty
6a90c27e6b Fix compatibility with OBS v32 (#231)
Stop using deprecated API functions that were removed in OBS v32 (See: https://github.com/obsproject/obs-studio/pull/12488)
* Use the return value from `obs_scene_add` instead of looking it up after creating it using the deprecated `obs_scene_sceneitem_from_source` function
* Use v2 sceneitem transform API instead of deprecated v1 API
* Change all uses of `circlebuf` to `deque` as it's been renamed in OBS

Additional bug fixes:
* Wait until scenes are loaded to create text source if it's missing. Previously the creation of the text source could fail due to there not being any scenes yet to create it in
* Fix build on MacOS and some Linux distributions
* Catch exceptions thrown by ONNX runtime and log an error instead of crashing

Other changes:
* Update OBS source dependencies
  * OBS: 30.1.2 -> 31.1.1
  * OBS deps: 2024-03-19 -> 2025-07-11
  * Qt6 deps: 2024-03-19 -> 2025-02-04
* Update build files to more closely match the current plugin template repository

Fixes: #230
Fixes: #227
2025-10-17 22:03:16 +01:00
RodriMora
491109d7cc Fixed typo (#186) 2025-02-03 09:09:11 -05:00
Ruwen Hahn
b3be219915 Add WebVTT-in-video-stream support (#196)
* Fix `whisper_buffer` and `resampled_buffer` data race

`media_unpause` was causing `wisper_buffer` to be freed while
`vad_based_segmentation`/`hybrid_vad_segmentation` need that buffer
to not be modified for the duration of those calls

* Slightly improve handling for weird subtitle output filenames

* Squashed 'deps/c-webvtt-in-video-stream/' content from commit 5579ca6

git-subtree-dir: deps/c-webvtt-in-video-stream
git-subtree-split: 5579ca6dc9dcf94e3c14631c6c01b2ee4dfcf005

* Add WIP webvtt sei functionality

* Add webvtt recording/streaming settings

* Make latency_to_video_in_msecs and send_frequency_hz configurable

* Make webvtt languages configurable

* Add translation and main language separately

* Add rust CI integration
2025-02-03 09:06:52 -05:00
Roy Shilkrot
fe9a52157e Update README.md 2025-01-12 12:40:51 -05:00
Roy Shilkrot
bb1db72052 Update version to 0.3.9 in README.md and buildspec.json; add availabl… (#184)
* Update version to 0.3.9 in README.md and buildspec.json; add available versions section for clarity

* Add support for custom API translation provider and update configuration options

* Enable ccache support in CMake build scripts for improved build performance

* Add status message for detected ccache in CMake build scripts
0.3.9
2024-11-25 12:57:08 -05:00
Roy Shilkrot
a32e327477 No VAD segmentation option (#182)
* Add support for disabled VAD mode and enhance CMake configuration

* Enhance VAD processing and transcription filter data structure with additional comments and logic adjustments

* Enhance VAD processing with improved logging, adjust single segment default, and update inference handling

* Refactor whisper parameter handling and enhance utility functions for better clarity and maintainability

* Add whisper parameters group properties and clean up related code

* Add whisper parameters handling and update related files for improved functionality

* Refactor whisper parameter type casting for improved clarity and consistency

* trigger build

* Fix logging message to use the correct variable for saved sentence
2024-11-24 23:28:46 -05:00
Roy Shilkrot
04a6f6a2a1 Add cloud translation support with multiple providers and configurati… (#183)
* Add cloud translation support with multiple providers and configuration options

* Refactor CMakeLists.txt for cloud translation sources formatting

* Add support for translating only full sentences in cloud translation

* Update ICU build configuration and fix header include case sensitivity

* Fix CURL helper function signatures and improve URL encoding

* Fix character type casting in DeepLTranslator for language conversion

* Refactor file saving logic in transcription filter to streamline sentence handling and add support for saving translated sentences

* Add support for Deepl Free API endpoint and enhance cloud translation configuration

* Add ccache detection to ICU build configuration for improved compilation speed

* Enhance ICU build configuration to use ccache as a compiler wrapper for improved performance
2024-11-24 22:11:49 -05:00
Roy Shilkrot
b7ab6a9ed4 Update version to 0.3.8 in README.md and buildspec.json 0.3.8 2024-11-06 10:50:46 -05:00
Roy Shilkrot
b13988d4a2 translations 2024-11-06 09:09:45 -05:00
Roy Shilkrot
f478809f79 Add new Whisper models to models_directory.json and adjust transcription filter properties 2024-10-30 15:48:32 -04:00
Roy Shilkrot
3668195652 Update README.md 2024-10-28 22:08:46 -04:00
Roy Shilkrot
548cb293ee Update prebuilt Whispercpp version to 0.0.7 and change download URL (#180)
* Update prebuilt Whispercpp version to 0.0.7 and change download URL

* Update acceleration options in build scripts
2024-10-25 12:28:50 -04:00
Roy Shilkrot
073dd2a837 Update README.md 2024-10-23 12:24:08 -04:00
Roy Shilkrot
58005da34a enable transcript rename for non-srt files (#179)
* enable transcript rename for non-srt files

* Update transcription-filter-callbacks.cpp

fix edit mistake

* tweak formatting

* Refactor file renaming logic in transcription-filter-callbacks.cpp

* Refactor file renaming logic and handle recording state changes in transcription-filter-callbacks.cpp

* Refactor file renaming logic and handle recording state changes in transcription-filter-callbacks.cpp

---------

Co-authored-by: Stephen <stephen@lenovo-thinkpad.lan>
Co-authored-by: Stephen Schrauger <6665521+schrauger@users.noreply.github.com>
Co-authored-by: schrauger <schrauger@users.noreply.github.com>
2024-10-11 10:48:02 -04:00
Roy Shilkrot
e26819cf9a Update download links and version number in README.md and buildspec.json 0.3.7 2024-10-09 14:44:42 -04:00
Roy Shilkrot
6936293dce Update README.md 2024-10-09 13:00:36 -04:00
Roy Shilkrot
498d0d6f5a refactor: Add download links for different platforms in README.md 2024-10-09 12:57:05 -04:00
Roy Shilkrot
2834ba1bf5 refactor: Enable downloading models directory from GitHub
This commit modifies the `load_models_info` function in `model-infos.cpp` to enable downloading the models directory from GitHub. Previously, the download functionality was commented out, but now it is uncommented to allow for successful downloading of the directory. This change improves the functionality of the code.

Ref: #<issue_number>
2024-10-09 10:47:07 -04:00
Roy Shilkrot
41bd57fd5a refactor: Update translation options in transcription-filter-properties.cpp
Simplify the translation options in the transcription-filter-properties.cpp file by adding a new option "translate_only_full_sentences". This option will be visible only when the "translate_enabled" flag is true and the "is_advanced" flag is set.

Remove unnecessary code in model-infos.cpp

Remove the code that logs a warning message when the "sha256" field is missing or invalid in the model JSON file. This code is no longer needed as it does not affect the functionality of the program.

Comment out download_json_from_github in model-infos.cpp

Comment out the call to the "download_json_from_github" function in the load_models_info() function in model-infos.cpp. This function is currently not working as intended and needs further investigation.
2024-10-09 10:46:46 -04:00
Roy Shilkrot
5670ac94b2 Model directory (#172)
* refactor: Handle file exceptions when writing raw sentence and translations

This commit modifies the code in transcription-filter-callbacks.cpp to handle file exceptions when writing raw sentence and translations to files. It adds exception handling using try-catch blocks to ensure that file operations are properly handled. This change improves the robustness of the code and prevents crashes or unexpected behavior when file operations fail.

* refactor: Update models_info function to use cached models information

The models_info function in model-downloader.cpp has been updated to use a cached version of the models information. This improves performance by avoiding unnecessary file reads and JSON parsing. The function now returns a const reference to the cached models_info map. This change ensures that the models_info function is more efficient and reduces the overhead of loading the models information.

Refactor the code in model-downloader.cpp to use the updated models_info function and remove the unnecessary file read and JSON parsing code.

Closes #123

* refactor: Simplify file handling in transcription-filter-callbacks.cpp

* refactor: Add script to query Hugging Face models and update models_directory.json

This commit adds two new scripts, hugging_face_model_query.py and hugging_face_model_query_all.py, to query Hugging Face models and update the models_directory.json file. The hugging_face_model_query.py script fetches model information from the Hugging Face API and adds new models to the models_directory.json file. The hugging_face_model_query_all.py script fetches a list of models matching a specific search criteria and adds the matching models to the models_directory.json file. These scripts will help keep the models_directory.json file up to date with the latest models available on Hugging Face.

Refactor the file handling in transcription-filter-callbacks.cpp

This commit simplifies the file handling in the transcription-filter-callbacks.cpp file. The changes aim to improve the readability and maintainability of the code by reducing complexity and removing unnecessary code.

Update the models_info function to use cached models information

This commit updates the models_info function to use cached models information instead of fetching it every time the function is called. This change improves the performance of the function by reducing the number of API calls and improves the overall efficiency of the code.

Handle file exceptions when writing raw sentence and translations

This commit adds exception handling code to handle file exceptions when writing raw sentence and translations. The changes ensure that any file-related exceptions are caught and properly handled, preventing the program from crashing or producing incorrect results.

Simplify the Onnxruntime installation in FetchOnnxruntime.cmake

This commit simplifies the Onnxruntime installation process in the FetchOnnxruntime.cmake file. The changes aim to make the installation steps more concise and easier to understand, improving the overall maintainability of the code.

Update the version to 0.3.6 and adjust the website URL

This commit updates the version of the software to 0.3.6 and adjusts the website URL accordingly. The changes ensure that the software is properly versioned and the website URL is up to date.

* refactor: Add ExtraInfo struct to ModelInfo and update models_info function

* refactor: Update model names in models_directory.json and fix URL in transcription-filter.h
2024-10-08 22:41:20 -04:00
Roy Shilkrot
622f0b163e refactor: Simplify Onnxruntime installation in FetchOnnxruntime.cmake 0.3.6 2024-10-01 16:17:14 -04:00
Roy Shilkrot
dacfc63e79 refactor: Update version to 0.3.6 and adjust website URL
Update the version of the OBS Localvocal project to 0.3.6 and adjust the website URL to point to the correct GitHub repository.
2024-10-01 16:03:51 -04:00
Roy Shilkrot
599830175b chore: Update ONNX Runtime version to 1.19.2 and adjust corresponding hashes (#166) 2024-10-01 13:41:59 -04:00
Roy Shilkrot
65408db097 refactor: Add filter-replace-utils for serializing and deserializing (#170)
- Add filter-replace-utils for serializing and deserializing in the src/ui directory.
- Update CMakeLists.txt to include the new files in the target_sources.
- Update FindLibAvObs.cmake to read buildspec.json from the CMAKE_SOURCE_DIR.
- Update model-infos.cpp to include two new Whisper models.
- Add a new CMakeLists.txt file in the src/tests directory.
- Add localvocal-offline-test.cpp in the src/tests directory.
- Add clear_current_caption function in localvocal-offline-test.cpp.
2024-10-01 13:41:47 -04:00
Roy Shilkrot
024502333a refactor: Update version to 0.3.5 and clear current caption in transc… (#164)
* refactor: Update version to 0.3.5 and clear current caption in transcription filter callbacks

* feat: Refactor whisper-processing.cpp for improved VAD segmentation and token buffer thread

* feat: Update prebuilt Whispercpp version to 0.0.6

* refactor: Remove trailing whitespace in translation-language-utils.h

* refactor: Add case-insensitive flag to regex in set_text_callback

The code change adds the `std::regex_constants::icase` flag to the regex used in the `set_text_callback` function in `transcription-filter-callbacks.cpp`. This allows for case-insensitive matching when replacing filter words in the `str_copy` string.

Refactor the code to improve VAD segmentation and token buffer thread in whisper-processing.cpp

The code change refactors the `whisper-processing.cpp` file to improve the VAD (Voice Activity Detection) segmentation and token buffer thread. This aims to enhance the performance and accuracy of the transcription filtering process.

refactor: Add prepopulated filter options and corresponding map entries in FilterReplaceDialog

The code change adds prepopulated filter options, such as "English Swear Words," "English Hallucinations," and "Korean Hallucinations," to the `FilterReplaceDialog` UI. It also adds the corresponding map entries to the `filter_words_replace` map, allowing users to easily add predefined filter patterns and replacement values.

refactor: Update version to 0.3.5 and clear current caption in transcription filter callbacks

The code change updates the version to 0.3.5 and clears the current caption in the transcription filter callbacks. This ensures that the correct version is displayed and any previous captions are removed.

refactor: Remove trailing whitespace in translation-language-utils.h

The code change removes trailing whitespace in the `translation-language-utils.h` file, improving code readability and consistency.
0.3.5
2024-09-12 20:06:26 -04:00
Roy Shilkrot
abe678bbb1 Update README.md 2024-09-12 10:32:09 -04:00
Roy Shilkrot
ec56c74e51 refactor: Add filter-replace-utils for serializing and deserializing … (#154)
* refactor: Add filter-replace-utils for serializing and deserializing filter words replacements

* refactor: Add filter-replace-utils for serializing and deserializing filter words replacements

* refactor: Add filter-replace-utils for serializing and deserializing filter words replacements
2024-09-09 11:36:53 -04:00
Roy Shilkrot
e3c69518a7 Fix hangups and VAD segmentation (#157)
* Fix hangups and VAD segmentation

* feat: Add max_sub_duration field to transcription filter data

* chore: Update VAD parameters for better segmentation accuracy

* feat: Add segment_duration field to transcription filter data

* feat: Optimize VAD processing for better performance

* feat: Refactor token buffer thread and whisper processing

The code changes involve refactoring the token buffer thread and whisper processing. The token buffer thread now uses the variable name `word_token` instead of `word` for better clarity. In the whisper processing, the log message format has been updated to include the segment number and token number. These changes aim to improve the performance and accuracy of VAD processing, as well as add new fields to the transcription filter data.

* Refactor token buffer thread and whisper processing

* refactor: Update translation context in transcription filter

The code changes in this commit update the translation context in the transcription filter. The `translate_add_context` property has been changed from a boolean to an integer slider, allowing the user to specify the number of context lines to add to the translation. This change aims to provide more flexibility in controlling the context for translation and improve the accuracy of the translation output.

* refactor: Update last_text variable name in transcription filter callbacks

* feat: Add translation language utilities

This commit adds a new file, `translation-language-utils.h`, which contains utility functions for handling translation languages. The `remove_start_punctuation` function removes any leading punctuation from a given string. This utility will be used in the translation process to improve the quality of the translated output.

* feat: Update ICU library configuration and dependencies

This commit updates the configuration and dependencies of the ICU library. The `BuildICU.cmake` file has been modified to use the `INSTALL_DIR` variable instead of the `ICU_INSTALL_DIR` variable for setting the ICU library paths. Additionally, the `ICU_IN_LIBRARY` variable has been renamed to `ICU_IN_LIBRARY` for better clarity. These changes aim to improve the build process and ensure proper linking of the ICU library.

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

* refactor: Update ICU library configuration and dependencies

This commit updates the `BuildICU.cmake` file to set the `CFLAGS`, `CXXFLAGS`, and `LDFLAGS` environment variables to `-fPIC` for Linux platforms. This change aims to ensure that the ICU library is built with position-independent code, improving compatibility and security. Additionally, the `icuin` library has been renamed to `icui18n` to align with the naming convention. These updates enhance the build process and maintain consistency in the ICU library configuration.
2024-09-06 10:27:05 -04:00
Ruwen Hahn
12fa9dce65 Fix model downloader crash on shutdown (#155)
* Fix `ModelDownloader` not being aware of child object deletions

* Delete `ModelDownloader` after it's done processing

Otherwise this is only deleted when OBS exits, effectively leaking
memory
2024-08-20 16:31:04 -04:00
Ruwen Hahn
bdab41cafc More offline test improvements (#153)
* Protect logging with a mutex

Main thread and worker thread output could get interleaved weirdly
without this

* Move segments.json saving to different thread

This was taking a considerable amount of time, especially for longer
input files, reducing overall utilization

* Check whether offline test can push more data before waiting

* Fix offline test with large files

In
```
circlebuf_push_back(
  &gf->input_buffers[c],
  audio[c].data() +
    frames_count * frame_size_bytes,
  frames_size_bytes);
```
`frames_count * frame_size_bytes` would overflow with `int` on
a 4 hour file; using `size_t` (on a 64 bit platform) fixes that
2024-08-14 09:28:33 -04:00
Ruwen Hahn
6cc88b1ead Offline test improvements (#150)
* look at the front of the whisper buffer instead of the back

this should mostly not make a difference, but feels semantically
more correct

* Initialize `resampled_buffer` for offline tests

* Read relevant audio bytes

There are two issues here:
1. `line_size` may contain padding (didn't happen in my tests)
2. from: 2b5f000d3f:/libavutil/frame.h#l405
> For audio, only linesize[0] may be set. For planar audio, each
> channel plane must be the same size.

* log running time in addition to local time

* Run whisper test "as fast as possible"

This kind of behaves like libobs, where each chunk of audio is
inspected individually by VAD/whisper, until processing of either
takes longer than the window length, in which case audio continues
to stream in

* Only ever send a single chunk of audio

* Add additional files to tests copy command

* Use condition variable to signal input thread if available

* Only wait in whisper thread if input buffers are empty
2024-08-09 13:45:42 -04:00
Ruwen Hahn
09839bbf15 Store acceleration info in cmake cache (#151)
This is to allow switching branches and rebuilding with the "same"
settings from e.g. visual studio (which will re-run a bunch of CMake
processing)
2024-08-09 13:44:15 -04:00
Roy Shilkrot
f65da7a97c Update README.md 2024-08-05 20:25:51 -04:00
Roy Shilkrot
7707af0710 refactor: Add target SPM loading and decoding logic in translation mo… (#149)
* refactor: Add target SPM loading and decoding logic in translation module

* refactor: Update target SPM loading error handling in translation module
2024-08-02 22:36:09 -04:00
Ruwen Hahn
0592fa7d9d Upgrade silero vad v5 (and some other changes) (#148)
* Add accessor for VAD window size in samples

* Feed buffered audio data to VAD in proper window sizes

* Wake whisper thread whenever audio is received

* Update silero VAD to v5

* Only reset VAD state between chunks of activity
2024-08-02 14:25:59 -04:00
Roy Shilkrot
a173e220c3 Use coreml/metal on Apple 0.3.4 2024-07-30 22:03:16 -07:00
Roy Shilkrot
5bca2ff595 refactor: Update buildspec.json to version 0.3.4 2024-07-31 00:42:34 -04:00
Roy Shilkrot
78907ea14d refactor: Update whisper model path and enable hipBLAS acceleration (#146)
* refactor: Update whisper model path and enable hipBLAS acceleration

* refactor: Update whisper model path and enable hipBLAS acceleration

* refactor: Update whisper model path and enable hipBLAS acceleration

* refactor: Update whisper model path and enable hipBLAS acceleration

* refactor: Update whisper model path and enable hipBLAS acceleration

* refactor: Update whisper model path and enable CoreML acceleration
2024-07-31 00:40:36 -04:00
Roy Shilkrot
87c5a0a1ca refactor: Prevent duplicate translation of sentences in send_sentence_to_translation (#145) 2024-07-29 21:04:28 -04:00
Roy Shilkrot
4e2f3def40 refactor: Avoid translating the same sentence twice in send_sentence_to_translation 2024-07-23 23:54:24 -04:00
Roy Shilkrot
c8c22fe5a0 Update README.md 2024-07-22 08:32:56 -04:00
Roy Shilkrot
73c91765a5 refactor: Update whisper model path and add flag for model loaded status 0.3.3 2024-07-19 20:59:55 -04:00
Roy Shilkrot
b3e4bfa33a refactor: Enable partial transcription with a latency of 1000ms (#141)
* refactor: Enable partial transcription with a latency of 1000ms

* refactor: Update CMakePresets.json and buildspec.json

- Remove the "QT_VERSION" variable from CMakePresets.json for all platforms
- Update the "version" of "obs-studio" and "prebuilt" dependencies in buildspec.json
- Update the "version" of "qt6" dependency in buildspec.json
- Update the "version" of the project to "0.3.3" in buildspec.json
- Update the "version" of the project to "0.3.3" in CMakePresets.json
- Remove unused code in whisper-processing.cpp

* refactor: Add -Wno-error=deprecated-declarations option to compilerconfig.cmake

* refactor: Update language codes in translation module
2024-07-19 14:02:24 -04:00
Roy Shilkrot
19017ca17f refactor: Update language codes in translation module (#140) 2024-07-18 01:00:09 -04:00
Roy Shilkrot
4e3fdcd6ef fix whisper model loading language (#139)
* refactor: Add boolean flag for whisper model loaded status

* refactor: Improve handling of whisper model paths in transcription filter

* refactor: Update whisper model path and add flag for model loaded status
2024-07-17 18:54:34 -04:00
Roy Shilkrot
44f072b5ff refactor: Add transcription-filter-properties.cpp for managing filter… (#138)
* refactor: Add transcription-filter-properties.cpp for managing filter properties

* refactor: Add translation_monitor to transcription filter

- Add translation_monitor to the transcription filter data structure
- Initialize and stop the translation_monitor in the transcription_filter_update function
- Update the send_caption_to_source function to use the translation_monitor for sending translated captions
- Clear the translation_monitor when disabling buffered output in the transcription_filter_update function

* refactor: Simplify UI and improve error handling in transcription filter
2024-07-17 12:18:31 -04:00