CHANGES - Add --debug flag controlling runtime logging verbosity levels - Introduce internal/log package with Off, Basic, Detailed, Trace - Replace ad-hoc Debugf and globals with centralized debug logger - Wire debug level during early CLI argument parsing - Add bash, zsh, fish completions for --debug levels - Document debug levels in README with usage examples - Add comprehensive STT guide covering models, flags, workflows - Simplify splitAudioFile signature and log ffmpeg chunking operations - Remove FABRIC_STT_DEBUG environment variable and related code - Clean minor code paths in vendors and template modules
4.4 KiB
Using Speech-To-Text (STT) with Fabric
Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
Overview
The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
Requirements
- OpenAI API key configured in Fabric
- For files larger than 25MB:
ffmpeginstalled on your system - Supported audio/video formats:
.mp3,.mp4,.mpeg,.mpga,.m4a,.wav,.webm
Basic Usage
Simple Transcription
To transcribe an audio file and send the result to a pattern:
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
Transcription Only
To just transcribe a file without applying a pattern:
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
Command Line Flags
Required Flags
--transcribe-file: Path to the audio or video file to transcribe--transcribe-model: Model to use for transcription (required when using transcription)
Optional Flags
--split-media-file: Automatically split files larger than 25MB into chunks using ffmpeg
Available Models
You can list all available transcription models with:
fabric --list-transcription-models
Currently supported models:
whisper-1: OpenAI's Whisper modelgpt-4o-mini-transcribe: GPT-4o Mini transcription modelgpt-4o-transcribe: GPT-4o transcription model
File Size Handling
Files Under 25MB
Files under the 25MB limit are processed directly without any special handling.
Files Over 25MB
For files exceeding OpenAI's 25MB limit, you have two options:
- Manual handling: The command will fail with an error message suggesting to use
--split-media-file - Automatic splitting: Use the
--split-media-fileflag to automatically split the file into chunks
fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
When splitting is enabled:
- Fabric uses
ffmpegto split the file into 10-minute segments initially - If segments are still too large, it reduces the segment time by half repeatedly
- All segments are transcribed and the results are concatenated
- Temporary files are automatically cleaned up after processing
Integration with Patterns
The transcribed text is seamlessly integrated into Fabric's workflow:
- File is transcribed using the specified model
- Transcribed text becomes the input message
- Text is sent to the specified pattern or chat session
Example Workflows
Meeting transcription and summarization:
fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
Interview analysis:
fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
Large video file processing:
fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
Error Handling
Common error scenarios:
- Unsupported format: Only the listed audio/video formats are supported
- File too large: Use
--split-media-filefor files over 25MB - Missing ffmpeg: Install ffmpeg for automatic file splitting
- Invalid model: Use
--list-transcription-modelsto see available models - Missing model: The
--transcribe-modelflag is required when using--transcribe-file
Technical Details
Implementation
- Transcription is handled in
internal/cli/transcribe.go:14 - OpenAI-specific implementation in
internal/plugins/ai/openai/openai_audio.go:41 - File splitting uses ffmpeg with configurable segment duration
- Supports any vendor that implements the
transcriberinterface
Processing Pipeline
- CLI validates file format and size
- If file > 25MB and splitting enabled, file is split using ffmpeg
- Each file/segment is sent to OpenAI's transcription API
- Results are concatenated with spaces between segments
- Transcribed text is passed as input to the main Fabric pipeline
Vendor Support
Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.