mirror of
https://github.com/danielmiessler/Fabric.git
synced 2026-01-08 22:08:03 -05:00
feat: add --debug flag with levels and centralized logging
CHANGES - Add --debug flag controlling runtime logging verbosity levels - Introduce internal/log package with Off, Basic, Detailed, Trace - Replace ad-hoc Debugf and globals with centralized debug logger - Wire debug level during early CLI argument parsing - Add bash, zsh, fish completions for --debug levels - Document debug levels in README with usage examples - Add comprehensive STT guide covering models, flags, workflows - Simplify splitAudioFile signature and log ffmpeg chunking operations - Remove FABRIC_STT_DEBUG environment variable and related code - Clean minor code paths in vendors and template modules
This commit is contained in:
139
docs/Using-Speech-To-Text.md
Normal file
139
docs/Using-Speech-To-Text.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Using Speech-To-Text (STT) with Fabric
|
||||
|
||||
Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
|
||||
|
||||
## Overview
|
||||
|
||||
The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
|
||||
|
||||
## Requirements
|
||||
|
||||
- OpenAI API key configured in Fabric
|
||||
- For files larger than 25MB: `ffmpeg` installed on your system
|
||||
- Supported audio/video formats: `.mp3`, `.mp4`, `.mpeg`, `.mpga`, `.m4a`, `.wav`, `.webm`
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### Simple Transcription
|
||||
|
||||
To transcribe an audio file and send the result to a pattern:
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
|
||||
```
|
||||
|
||||
### Transcription Only
|
||||
|
||||
To just transcribe a file without applying a pattern:
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
|
||||
```
|
||||
|
||||
## Command Line Flags
|
||||
|
||||
### Required Flags
|
||||
|
||||
- `--transcribe-file`: Path to the audio or video file to transcribe
|
||||
- `--transcribe-model`: Model to use for transcription (required when using transcription)
|
||||
|
||||
### Optional Flags
|
||||
|
||||
- `--split-media-file`: Automatically split files larger than 25MB into chunks using ffmpeg
|
||||
|
||||
## Available Models
|
||||
|
||||
You can list all available transcription models with:
|
||||
|
||||
```bash
|
||||
fabric --list-transcription-models
|
||||
```
|
||||
|
||||
Currently supported models:
|
||||
|
||||
- `whisper-1`: OpenAI's Whisper model
|
||||
- `gpt-4o-mini-transcribe`: GPT-4o Mini transcription model
|
||||
- `gpt-4o-transcribe`: GPT-4o transcription model
|
||||
|
||||
## File Size Handling
|
||||
|
||||
### Files Under 25MB
|
||||
|
||||
Files under the 25MB limit are processed directly without any special handling.
|
||||
|
||||
### Files Over 25MB
|
||||
|
||||
For files exceeding OpenAI's 25MB limit, you have two options:
|
||||
|
||||
1. **Manual handling**: The command will fail with an error message suggesting to use `--split-media-file`
|
||||
2. **Automatic splitting**: Use the `--split-media-file` flag to automatically split the file into chunks
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
|
||||
```
|
||||
|
||||
When splitting is enabled:
|
||||
|
||||
- Fabric uses `ffmpeg` to split the file into 10-minute segments initially
|
||||
- If segments are still too large, it reduces the segment time by half repeatedly
|
||||
- All segments are transcribed and the results are concatenated
|
||||
- Temporary files are automatically cleaned up after processing
|
||||
|
||||
## Integration with Patterns
|
||||
|
||||
The transcribed text is seamlessly integrated into Fabric's workflow:
|
||||
|
||||
1. File is transcribed using the specified model
|
||||
2. Transcribed text becomes the input message
|
||||
3. Text is sent to the specified pattern or chat session
|
||||
|
||||
### Example Workflows
|
||||
|
||||
**Meeting transcription and summarization:**
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
|
||||
```
|
||||
|
||||
**Interview analysis:**
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
|
||||
```
|
||||
|
||||
**Large video file processing:**
|
||||
|
||||
```bash
|
||||
fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
Common error scenarios:
|
||||
|
||||
- **Unsupported format**: Only the listed audio/video formats are supported
|
||||
- **File too large**: Use `--split-media-file` for files over 25MB
|
||||
- **Missing ffmpeg**: Install ffmpeg for automatic file splitting
|
||||
- **Invalid model**: Use `--list-transcription-models` to see available models
|
||||
- **Missing model**: The `--transcribe-model` flag is required when using `--transcribe-file`
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Implementation
|
||||
|
||||
- Transcription is handled in `internal/cli/transcribe.go:14`
|
||||
- OpenAI-specific implementation in `internal/plugins/ai/openai/openai_audio.go:41`
|
||||
- File splitting uses ffmpeg with configurable segment duration
|
||||
- Supports any vendor that implements the `transcriber` interface
|
||||
|
||||
### Processing Pipeline
|
||||
|
||||
1. CLI validates file format and size
|
||||
2. If file > 25MB and splitting enabled, file is split using ffmpeg
|
||||
3. Each file/segment is sent to OpenAI's transcription API
|
||||
4. Results are concatenated with spaces between segments
|
||||
5. Transcribed text is passed as input to the main Fabric pipeline
|
||||
|
||||
### Vendor Support
|
||||
|
||||
Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.
|
||||
Reference in New Issue
Block a user