feat: add --debug flag with levels and centralized logging

CHANGES - Add --debug flag controlling runtime logging verbosity levels - Introduce internal/log package with Off, Basic, Detailed, Trace - Replace ad-hoc Debugf and globals with centralized debug logger - Wire debug level during early CLI argument parsing - Add bash, zsh, fish completions for --debug levels - Document debug levels in README with usage examples - Add comprehensive STT guide covering models, flags, workflows - Simplify splitAudioFile signature and log ffmpeg chunking operations - Remove FABRIC_STT_DEBUG environment variable and related code - Clean minor code paths in vendors and template modules
2026-01-08 22:08:03 -05:00 · 2025-08-18 23:53:10 -07:00
parent 6a3a7e82d1
commit ff1ef380a7
12 changed files with 272 additions and 38 deletions
--- a/docs/Using-Speech-To-Text.md
+++ b/docs/Using-Speech-To-Text.md
@@ -0,0 +1,139 @@
+# Using Speech-To-Text (STT) with Fabric
+
+Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
+
+## Overview
+
+The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
+
+## Requirements
+
+- OpenAI API key configured in Fabric
+- For files larger than 25MB: `ffmpeg` installed on your system
+- Supported audio/video formats: `.mp3`, `.mp4`, `.mpeg`, `.mpga`, `.m4a`, `.wav`, `.webm`
+
+## Basic Usage
+
+### Simple Transcription
+
+To transcribe an audio file and send the result to a pattern:
+
+```bash
+fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
+```
+
+### Transcription Only
+
+To just transcribe a file without applying a pattern:
+
+```bash
+fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
+```
+
+## Command Line Flags
+
+### Required Flags
+
+- `--transcribe-file`: Path to the audio or video file to transcribe
+- `--transcribe-model`: Model to use for transcription (required when using transcription)
+
+### Optional Flags
+
+- `--split-media-file`: Automatically split files larger than 25MB into chunks using ffmpeg
+
+## Available Models
+
+You can list all available transcription models with:
+
+```bash
+fabric --list-transcription-models
+```
+
+Currently supported models:
+
+- `whisper-1`: OpenAI's Whisper model
+- `gpt-4o-mini-transcribe`: GPT-4o Mini transcription model
+- `gpt-4o-transcribe`: GPT-4o transcription model
+
+## File Size Handling
+
+### Files Under 25MB
+
+Files under the 25MB limit are processed directly without any special handling.
+
+### Files Over 25MB
+
+For files exceeding OpenAI's 25MB limit, you have two options:
+
+1. **Manual handling**: The command will fail with an error message suggesting to use `--split-media-file`
+2. **Automatic splitting**: Use the `--split-media-file` flag to automatically split the file into chunks
+
+```bash
+fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
+```
+
+When splitting is enabled:
+
+- Fabric uses `ffmpeg` to split the file into 10-minute segments initially
+- If segments are still too large, it reduces the segment time by half repeatedly
+- All segments are transcribed and the results are concatenated
+- Temporary files are automatically cleaned up after processing
+
+## Integration with Patterns
+
+The transcribed text is seamlessly integrated into Fabric's workflow:
+
+1. File is transcribed using the specified model
+2. Transcribed text becomes the input message
+3. Text is sent to the specified pattern or chat session
+
+### Example Workflows
+
+**Meeting transcription and summarization:**
+
+```bash
+fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
+```
+
+**Interview analysis:**
+
+```bash
+fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
+```
+
+**Large video file processing:**
+
+```bash
+fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
+```
+
+## Error Handling
+
+Common error scenarios:
+
+- **Unsupported format**: Only the listed audio/video formats are supported
+- **File too large**: Use `--split-media-file` for files over 25MB
+- **Missing ffmpeg**: Install ffmpeg for automatic file splitting
+- **Invalid model**: Use `--list-transcription-models` to see available models
+- **Missing model**: The `--transcribe-model` flag is required when using `--transcribe-file`
+
+## Technical Details
+
+### Implementation
+
+- Transcription is handled in `internal/cli/transcribe.go:14`
+- OpenAI-specific implementation in `internal/plugins/ai/openai/openai_audio.go:41`
+- File splitting uses ffmpeg with configurable segment duration
+- Supports any vendor that implements the `transcriber` interface
+
+### Processing Pipeline
+
+1. CLI validates file format and size
+2. If file > 25MB and splitting enabled, file is split using ffmpeg
+3. Each file/segment is sent to OpenAI's transcription API
+4. Results are concatenated with spaces between segments
+5. Transcribed text is passed as input to the main Fabric pipeline
+
+### Vendor Support
+
+Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.