faster-whisper

mirror of https://github.com/SYSTRAN/faster-whisper.git synced 2026-01-09 13:38:01 -05:00

Files

Mario 14ba1051f3 Fix: add <|nocaptions|> to suppressed tokens (#1338 )

* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

- Add nocaptions component tokens [1771, 496, 9799] to suppress_tokens list
- Add segment filtering to remove any remaining <|nocaptions|> segments
- Resolves issue where BatchedInferencePipeline would generate malformed
  special tokens during periods of silence or low-confidence transcription
- Includes comprehensive tests to verify the fix

The issue occurred because while bracket tokens ('<', '|', '>') were
already suppressed, the content tokens ('no', 'ca', 'ptions') were not,
leading to partial token generation that formed complete <|nocaptions|>
tags in the output.

Files changed:
- faster_whisper/transcribe.py: Core fix implementation
- test_nocaptions_comprehensive.py: Comprehensive test suite
- tests/test_nocaptions_fix.py: Unit tests

* removed

* Fix: Prevent <|nocaptions|> tokens in BatchedInferencePipeline

* Fix: Implement proper <|nocaptions|> token suppression using single token approach

* ci: trigger tests

* fix: remove trailing whitespace from blank lines

* Update faster_whisper/transcribe.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Update faster_whisper/tokenizer.py

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

* Rename no_speech to no_captions in tokenizer

* nocaptions has been renamed to nospeech

* break line

* line break

* Refactor no_speech method for improved readability by adjusting line breaks

---------

Co-authored-by: Mahmoud Ashraf <hassouna97.ma@gmail.com>

2025-10-10 21:56:54 +03:00

assets

Use Silero VAD in Batched Mode (#936 )

2024-10-24 12:05:25 +03:00

__init__.py

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements (#856 )

2024-07-18 16:48:52 +07:00

audio.py

Remove torch dependency, Faster numpy Feature extraction (#1106 )

2024-11-14 12:57:10 +03:00

feature_extractor.py

Remove torch dependency, Faster numpy Feature extraction (#1106 )