mirror of
https://github.com/danielmiessler/Fabric.git
synced 2026-01-08 22:08:03 -05:00
CHANGES - Add --debug flag controlling runtime logging verbosity levels - Introduce internal/log package with Off, Basic, Detailed, Trace - Replace ad-hoc Debugf and globals with centralized debug logger - Wire debug level during early CLI argument parsing - Add bash, zsh, fish completions for --debug levels - Document debug levels in README with usage examples - Add comprehensive STT guide covering models, flags, workflows - Simplify splitAudioFile signature and log ffmpeg chunking operations - Remove FABRIC_STT_DEBUG environment variable and related code - Clean minor code paths in vendors and template modules
140 lines
4.4 KiB
Markdown
140 lines
4.4 KiB
Markdown
# Using Speech-To-Text (STT) with Fabric
|
|
|
|
Fabric supports speech-to-text transcription of audio and video files using OpenAI's transcription models. This feature allows you to convert spoken content into text that can then be processed through Fabric's patterns.
|
|
|
|
## Overview
|
|
|
|
The STT feature integrates OpenAI's Whisper and GPT-4o transcription models to convert audio/video files into text. The transcribed text is automatically passed as input to your chosen pattern or chat session.
|
|
|
|
## Requirements
|
|
|
|
- OpenAI API key configured in Fabric
|
|
- For files larger than 25MB: `ffmpeg` installed on your system
|
|
- Supported audio/video formats: `.mp3`, `.mp4`, `.mpeg`, `.mpga`, `.m4a`, `.wav`, `.webm`
|
|
|
|
## Basic Usage
|
|
|
|
### Simple Transcription
|
|
|
|
To transcribe an audio file and send the result to a pattern:
|
|
|
|
```bash
|
|
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1 --pattern summarize
|
|
```
|
|
|
|
### Transcription Only
|
|
|
|
To just transcribe a file without applying a pattern:
|
|
|
|
```bash
|
|
fabric --transcribe-file /path/to/audio.mp3 --transcribe-model whisper-1
|
|
```
|
|
|
|
## Command Line Flags
|
|
|
|
### Required Flags
|
|
|
|
- `--transcribe-file`: Path to the audio or video file to transcribe
|
|
- `--transcribe-model`: Model to use for transcription (required when using transcription)
|
|
|
|
### Optional Flags
|
|
|
|
- `--split-media-file`: Automatically split files larger than 25MB into chunks using ffmpeg
|
|
|
|
## Available Models
|
|
|
|
You can list all available transcription models with:
|
|
|
|
```bash
|
|
fabric --list-transcription-models
|
|
```
|
|
|
|
Currently supported models:
|
|
|
|
- `whisper-1`: OpenAI's Whisper model
|
|
- `gpt-4o-mini-transcribe`: GPT-4o Mini transcription model
|
|
- `gpt-4o-transcribe`: GPT-4o transcription model
|
|
|
|
## File Size Handling
|
|
|
|
### Files Under 25MB
|
|
|
|
Files under the 25MB limit are processed directly without any special handling.
|
|
|
|
### Files Over 25MB
|
|
|
|
For files exceeding OpenAI's 25MB limit, you have two options:
|
|
|
|
1. **Manual handling**: The command will fail with an error message suggesting to use `--split-media-file`
|
|
2. **Automatic splitting**: Use the `--split-media-file` flag to automatically split the file into chunks
|
|
|
|
```bash
|
|
fabric --transcribe-file large_recording.mp4 --transcribe-model whisper-1 --split-media-file --pattern summarize
|
|
```
|
|
|
|
When splitting is enabled:
|
|
|
|
- Fabric uses `ffmpeg` to split the file into 10-minute segments initially
|
|
- If segments are still too large, it reduces the segment time by half repeatedly
|
|
- All segments are transcribed and the results are concatenated
|
|
- Temporary files are automatically cleaned up after processing
|
|
|
|
## Integration with Patterns
|
|
|
|
The transcribed text is seamlessly integrated into Fabric's workflow:
|
|
|
|
1. File is transcribed using the specified model
|
|
2. Transcribed text becomes the input message
|
|
3. Text is sent to the specified pattern or chat session
|
|
|
|
### Example Workflows
|
|
|
|
**Meeting transcription and summarization:**
|
|
|
|
```bash
|
|
fabric --transcribe-file meeting.mp4 --transcribe-model gpt-4o-transcribe --pattern summarize
|
|
```
|
|
|
|
**Interview analysis:**
|
|
|
|
```bash
|
|
fabric --transcribe-file interview.mp3 --transcribe-model whisper-1 --pattern extract_insights
|
|
```
|
|
|
|
**Large video file processing:**
|
|
|
|
```bash
|
|
fabric --transcribe-file presentation.mp4 --transcribe-model gpt-4o-transcribe --split-media-file --pattern create_summary
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
Common error scenarios:
|
|
|
|
- **Unsupported format**: Only the listed audio/video formats are supported
|
|
- **File too large**: Use `--split-media-file` for files over 25MB
|
|
- **Missing ffmpeg**: Install ffmpeg for automatic file splitting
|
|
- **Invalid model**: Use `--list-transcription-models` to see available models
|
|
- **Missing model**: The `--transcribe-model` flag is required when using `--transcribe-file`
|
|
|
|
## Technical Details
|
|
|
|
### Implementation
|
|
|
|
- Transcription is handled in `internal/cli/transcribe.go:14`
|
|
- OpenAI-specific implementation in `internal/plugins/ai/openai/openai_audio.go:41`
|
|
- File splitting uses ffmpeg with configurable segment duration
|
|
- Supports any vendor that implements the `transcriber` interface
|
|
|
|
### Processing Pipeline
|
|
|
|
1. CLI validates file format and size
|
|
2. If file > 25MB and splitting enabled, file is split using ffmpeg
|
|
3. Each file/segment is sent to OpenAI's transcription API
|
|
4. Results are concatenated with spaces between segments
|
|
5. Transcribed text is passed as input to the main Fabric pipeline
|
|
|
|
### Vendor Support
|
|
|
|
Currently, only OpenAI is supported for transcription, but the interface allows for future expansion to other vendors that provide transcription capabilities.
|