mirror of
https://github.com/danielmiessler/Fabric.git
synced 2026-01-09 14:28:01 -05:00
feat: add Gemini TTS voice selection and listing functionality
## CHANGES - Add `--voice` flag for TTS voice selection - Add `--list-gemini-voices` command for voice discovery - Implement voice validation for Gemini TTS models - Update shell completions for voice options - Add comprehensive Gemini TTS documentation - Create voice samples directory structure - Extend spell checker dictionary with voice names
This commit is contained in:
155
docs/Gemini-TTS.md
Normal file
155
docs/Gemini-TTS.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Gemini Text-to-Speech (TTS) Guide
|
||||
|
||||
Fabric supports Google Gemini's text-to-speech (TTS) capabilities, allowing you to convert text into high-quality audio using various AI-generated voices.
|
||||
|
||||
## Overview
|
||||
|
||||
The Gemini TTS feature in Fabric allows you to:
|
||||
|
||||
- Convert text input into audio using Google's Gemini TTS models
|
||||
- Choose from 30+ different AI voices with varying characteristics
|
||||
- Generate high-quality WAV audio files
|
||||
- Integrate TTS generation into your existing Fabric workflows
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic TTS Generation
|
||||
|
||||
To generate audio from text using TTS:
|
||||
|
||||
```bash
|
||||
# Basic TTS with default voice (Kore)
|
||||
echo "Hello, this is a test of Gemini TTS" | fabric -m gemini-2.0-flash-tts -o output.wav
|
||||
|
||||
# Using a specific voice
|
||||
echo "Hello, this is a test with the Charon voice" | fabric -m gemini-2.0-flash-tts --voice Charon -o output.wav
|
||||
|
||||
# Using TTS with a pattern
|
||||
fabric -p summarize --voice Puck -m gemini-2.0-flash-tts -o summary.wav < document.txt
|
||||
```
|
||||
|
||||
### Voice Selection
|
||||
|
||||
Use the `--voice` flag to specify which voice to use for TTS generation:
|
||||
|
||||
```bash
|
||||
fabric -m gemini-2.0-flash-tts --voice Zephyr -o output.wav "Your text here"
|
||||
```
|
||||
|
||||
If no voice is specified, the default voice "Kore" will be used.
|
||||
|
||||
## Available Voices
|
||||
|
||||
Gemini TTS supports 30+ different voices, each with unique characteristics:
|
||||
|
||||
### Popular Voices
|
||||
|
||||
- **Kore** - Firm and confident (default)
|
||||
- **Charon** - Informative and clear
|
||||
- **Puck** - Upbeat and energetic
|
||||
- **Zephyr** - Bright and cheerful
|
||||
- **Leda** - Youthful and energetic
|
||||
- **Aoede** - Breezy and natural
|
||||
|
||||
### Complete Voice List
|
||||
|
||||
- Kore, Charon, Puck, Fenrir, Aoede, Leda, Orus, Zephyr
|
||||
- Autonoe, Callirhoe, Despina, Erinome, Gacrux, Laomedeia
|
||||
- Pulcherrima, Sulafat, Vindemiatrix, Achernar, Achird
|
||||
- Algenib, Algieba, Alnilam, Enceladus, Iapetus, Rasalgethi
|
||||
- Sadachbia, Zubenelgenubi, Vega, Capella, Lyra
|
||||
|
||||
### Listing Available Voices
|
||||
|
||||
To see all available voices with descriptions:
|
||||
|
||||
```bash
|
||||
# List all voices with characteristics
|
||||
fabric --list-gemini-voices
|
||||
|
||||
# List voice names only (for shell completion)
|
||||
fabric --list-gemini-voices --shell-complete-list
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
Google Gemini TTS has usage quotas that vary by plan:
|
||||
|
||||
### Free Tier
|
||||
|
||||
- **15 requests per day** per project per TTS model
|
||||
- Quota resets daily
|
||||
- Applies to all TTS models (e.g., `gemini-2.5-flash-preview-tts`)
|
||||
|
||||
### Rate Limit Errors
|
||||
|
||||
If you exceed your quota, you'll see an error like:
|
||||
|
||||
```text
|
||||
Error 429: You exceeded your current quota, please check your plan and billing details
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Wait for daily quota reset (typically at midnight UTC)
|
||||
- Upgrade to a paid plan for higher limits
|
||||
- Use TTS generation strategically for important content
|
||||
|
||||
For current rate limits and pricing, visit: <https://ai.google.dev/gemini-api/docs/rate-limits>
|
||||
|
||||
## Configuration
|
||||
|
||||
### Command Line Options
|
||||
|
||||
- `--voice <voice_name>` - Specify the TTS voice to use
|
||||
- `-o <filename.wav>` - Output audio file (required for TTS models)
|
||||
- `-m <tts_model>` - Specify a TTS-capable model (e.g., `gemini-2.0-flash-tts`)
|
||||
|
||||
### YAML Configuration
|
||||
|
||||
You can also set a default voice in your Fabric configuration file (`~/.config/fabric/config.yaml`):
|
||||
|
||||
```yaml
|
||||
voice: "Charon" # Set your preferred default voice
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Valid Google Gemini API key configured in Fabric
|
||||
- TTS-capable Gemini model (models containing "tts" in the name)
|
||||
- Audio output must be specified with `-o filename.wav`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Error: "TTS model requires audio output"
|
||||
|
||||
- Solution: Always specify an output file with `-o filename.wav` when using TTS models
|
||||
|
||||
#### Error: "Invalid voice 'X'"
|
||||
|
||||
- Solution: Check that the voice name is spelled correctly and matches one of the supported voices listed above
|
||||
|
||||
#### Error: "TTS generation failed"
|
||||
|
||||
- Solution: Verify your Gemini API key is valid and you have sufficient quota
|
||||
|
||||
### Getting Help
|
||||
|
||||
For additional help with TTS features:
|
||||
|
||||
```bash
|
||||
fabric --help
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
- **Audio Format**: WAV files with 24kHz sample rate, 16-bit depth, mono channel
|
||||
- **Language Support**: Automatic language detection for 24+ languages
|
||||
- **Model Requirements**: Models must contain "tts", "preview-tts", or "text-to-speech" in the name
|
||||
- **Voice Selection**: Uses Google's PrebuiltVoiceConfig system for consistent voice quality
|
||||
|
||||
---
|
||||
|
||||
For more information about Fabric, visit the [main documentation](../README.md).
|
||||
Reference in New Issue
Block a user