feat: add Gemini TTS voice selection and listing functionality

## CHANGES

- Add `--voice` flag for TTS voice selection
- Add `--list-gemini-voices` command for voice discovery
- Implement voice validation for Gemini TTS models
- Update shell completions for voice options
- Add comprehensive Gemini TTS documentation
- Create voice samples directory structure
- Extend spell checker dictionary with voice names
This commit is contained in:
Kayvan Sylvan
2025-07-26 15:11:30 -07:00
parent eab335873e
commit 614b1322d5
12 changed files with 474 additions and 11 deletions

155
docs/Gemini-TTS.md Normal file
View File

@@ -0,0 +1,155 @@
# Gemini Text-to-Speech (TTS) Guide
Fabric supports Google Gemini's text-to-speech (TTS) capabilities, allowing you to convert text into high-quality audio using various AI-generated voices.
## Overview
The Gemini TTS feature in Fabric allows you to:
- Convert text input into audio using Google's Gemini TTS models
- Choose from 30+ different AI voices with varying characteristics
- Generate high-quality WAV audio files
- Integrate TTS generation into your existing Fabric workflows
## Usage
### Basic TTS Generation
To generate audio from text using TTS:
```bash
# Basic TTS with default voice (Kore)
echo "Hello, this is a test of Gemini TTS" | fabric -m gemini-2.0-flash-tts -o output.wav
# Using a specific voice
echo "Hello, this is a test with the Charon voice" | fabric -m gemini-2.0-flash-tts --voice Charon -o output.wav
# Using TTS with a pattern
fabric -p summarize --voice Puck -m gemini-2.0-flash-tts -o summary.wav < document.txt
```
### Voice Selection
Use the `--voice` flag to specify which voice to use for TTS generation:
```bash
fabric -m gemini-2.0-flash-tts --voice Zephyr -o output.wav "Your text here"
```
If no voice is specified, the default voice "Kore" will be used.
## Available Voices
Gemini TTS supports 30+ different voices, each with unique characteristics:
### Popular Voices
- **Kore** - Firm and confident (default)
- **Charon** - Informative and clear
- **Puck** - Upbeat and energetic
- **Zephyr** - Bright and cheerful
- **Leda** - Youthful and energetic
- **Aoede** - Breezy and natural
### Complete Voice List
- Kore, Charon, Puck, Fenrir, Aoede, Leda, Orus, Zephyr
- Autonoe, Callirhoe, Despina, Erinome, Gacrux, Laomedeia
- Pulcherrima, Sulafat, Vindemiatrix, Achernar, Achird
- Algenib, Algieba, Alnilam, Enceladus, Iapetus, Rasalgethi
- Sadachbia, Zubenelgenubi, Vega, Capella, Lyra
### Listing Available Voices
To see all available voices with descriptions:
```bash
# List all voices with characteristics
fabric --list-gemini-voices
# List voice names only (for shell completion)
fabric --list-gemini-voices --shell-complete-list
```
## Rate Limits
Google Gemini TTS has usage quotas that vary by plan:
### Free Tier
- **15 requests per day** per project per TTS model
- Quota resets daily
- Applies to all TTS models (e.g., `gemini-2.5-flash-preview-tts`)
### Rate Limit Errors
If you exceed your quota, you'll see an error like:
```text
Error 429: You exceeded your current quota, please check your plan and billing details
```
**Solutions:**
- Wait for daily quota reset (typically at midnight UTC)
- Upgrade to a paid plan for higher limits
- Use TTS generation strategically for important content
For current rate limits and pricing, visit: <https://ai.google.dev/gemini-api/docs/rate-limits>
## Configuration
### Command Line Options
- `--voice <voice_name>` - Specify the TTS voice to use
- `-o <filename.wav>` - Output audio file (required for TTS models)
- `-m <tts_model>` - Specify a TTS-capable model (e.g., `gemini-2.0-flash-tts`)
### YAML Configuration
You can also set a default voice in your Fabric configuration file (`~/.config/fabric/config.yaml`):
```yaml
voice: "Charon" # Set your preferred default voice
```
## Requirements
- Valid Google Gemini API key configured in Fabric
- TTS-capable Gemini model (models containing "tts" in the name)
- Audio output must be specified with `-o filename.wav`
## Troubleshooting
### Common Issues
#### Error: "TTS model requires audio output"
- Solution: Always specify an output file with `-o filename.wav` when using TTS models
#### Error: "Invalid voice 'X'"
- Solution: Check that the voice name is spelled correctly and matches one of the supported voices listed above
#### Error: "TTS generation failed"
- Solution: Verify your Gemini API key is valid and you have sufficient quota
### Getting Help
For additional help with TTS features:
```bash
fabric --help
```
## Technical Details
- **Audio Format**: WAV files with 24kHz sample rate, 16-bit depth, mono channel
- **Language Support**: Automatic language detection for 24+ languages
- **Model Requirements**: Models must contain "tts", "preview-tts", or "text-to-speech" in the name
- **Voice Selection**: Uses Google's PrebuiltVoiceConfig system for consistent voice quality
---
For more information about Fabric, visit the [main documentation](../README.md).