feat: add --debug flag with levels and centralized logging

CHANGES
- Add --debug flag controlling runtime logging verbosity levels
- Introduce internal/log package with Off, Basic, Detailed, Trace
- Replace ad-hoc Debugf and globals with centralized debug logger
- Wire debug level during early CLI argument parsing
- Add bash, zsh, fish completions for --debug levels
- Document debug levels in README with usage examples
- Add comprehensive STT guide covering models, flags, workflows
- Simplify splitAudioFile signature and log ffmpeg chunking operations
- Remove FABRIC_STT_DEBUG environment variable and related code
- Clean minor code paths in vendors and template modules
This commit is contained in:
Kayvan Sylvan
2025-08-18 23:53:10 -07:00
parent 6a3a7e82d1
commit ff1ef380a7
12 changed files with 272 additions and 38 deletions

View File

@@ -0,0 +1,139 @@
# Using Speech-To-Text (STT) with Fabric
Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
## Overview
The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
## Requirements
- OpenAI API key configured in Fabric
- For files larger than 25MB: `ffmpeg` installed on your system
- Supported audio/video formats: `.mp3`, `.mp4`, `.mpeg`, `.mpga`, `.m4a`, `.wav`, `.webm`
## Basic Usage
### Simple Transcription
To transcribe an audio file and send the result to a pattern:
```bash
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
```
### Transcription Only
To just transcribe a file without applying a pattern:
```bash
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
```
## Command Line Flags
### Required Flags
- `--transcribe-file`: Path to the audio or video file to transcribe
- `--transcribe-model`: Model to use for transcription (required when using transcription)
### Optional Flags
- `--split-media-file`: Automatically split files larger than 25MB into chunks using ffmpeg
## Available Models
You can list all available transcription models with:
```bash
fabric --list-transcription-models
```
Currently supported models:
- `whisper-1`: OpenAI's Whisper model
- `gpt-4o-mini-transcribe`: GPT-4o Mini transcription model
- `gpt-4o-transcribe`: GPT-4o transcription model
## File Size Handling
### Files Under 25MB
Files under the 25MB limit are processed directly without any special handling.
### Files Over 25MB
For files exceeding OpenAI's 25MB limit, you have two options:
1. **Manual handling**: The command will fail with an error message suggesting to use `--split-media-file`
2. **Automatic splitting**: Use the `--split-media-file` flag to automatically split the file into chunks
```bash
fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
```
When splitting is enabled:
- Fabric uses `ffmpeg` to split the file into 10-minute segments initially
- If segments are still too large, it reduces the segment time by half repeatedly
- All segments are transcribed and the results are concatenated
- Temporary files are automatically cleaned up after processing
## Integration with Patterns
The transcribed text is seamlessly integrated into Fabric's workflow:
1. File is transcribed using the specified model
2. Transcribed text becomes the input message
3. Text is sent to the specified pattern or chat session
### Example Workflows
**Meeting transcription and summarization:**
```bash
fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
```
**Interview analysis:**
```bash
fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
```
**Large video file processing:**
```bash
fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
```
## Error Handling
Common error scenarios:
- **Unsupported format**: Only the listed audio/video formats are supported
- **File too large**: Use `--split-media-file` for files over 25MB
- **Missing ffmpeg**: Install ffmpeg for automatic file splitting
- **Invalid model**: Use `--list-transcription-models` to see available models
- **Missing model**: The `--transcribe-model` flag is required when using `--transcribe-file`
## Technical Details
### Implementation
- Transcription is handled in `internal/cli/transcribe.go:14`
- OpenAI-specific implementation in `internal/plugins/ai/openai/openai_audio.go:41`
- File splitting uses ffmpeg with configurable segment duration
- Supports any vendor that implements the `transcriber` interface
### Processing Pipeline
1. CLI validates file format and size
2. If file > 25MB and splitting enabled, file is split using ffmpeg
3. Each file/segment is sent to OpenAI's transcription API
4. Results are concatenated with spaces between segments
5. Transcribed text is passed as input to the main Fabric pipeline
### Vendor Support
Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.