feat: add Gemini TTS voice selection and listing functionality

## CHANGES - Add `--voice` flag for TTS voice selection - Add `--list-gemini-voices` command for voice discovery - Implement voice validation for Gemini TTS models - Update shell completions for voice options - Add comprehensive Gemini TTS documentation - Create voice samples directory structure - Extend spell checker dictionary with voice names
2026-01-09 14:28:01 -05:00 · 2025-07-26 15:11:30 -07:00
parent eab335873e
commit 614b1322d5
12 changed files with 474 additions and 11 deletions
--- a/docs/Gemini-TTS.md
+++ b/docs/Gemini-TTS.md
@@ -0,0 +1,155 @@
+# Gemini Text-to-Speech (TTS) Guide
+
+Fabric supports Google Gemini's text-to-speech (TTS) capabilities, allowing you to convert text into high-quality audio using various AI-generated voices.
+
+## Overview
+
+The Gemini TTS feature in Fabric allows you to:
+
+- Convert text input into audio using Google's Gemini TTS models
+- Choose from 30+ different AI voices with varying characteristics
+- Generate high-quality WAV audio files
+- Integrate TTS generation into your existing Fabric workflows
+
+## Usage
+
+### Basic TTS Generation
+
+To generate audio from text using TTS:
+
+```bash
+# Basic TTS with default voice (Kore)
+echo "Hello, this is a test of Gemini TTS" | fabric -m gemini-2.0-flash-tts -o output.wav
+
+# Using a specific voice
+echo "Hello, this is a test with the Charon voice" | fabric -m gemini-2.0-flash-tts --voice Charon -o output.wav
+
+# Using TTS with a pattern
+fabric -p summarize --voice Puck -m gemini-2.0-flash-tts -o summary.wav < document.txt
+```
+
+### Voice Selection
+
+Use the `--voice` flag to specify which voice to use for TTS generation:
+
+```bash
+fabric -m gemini-2.0-flash-tts --voice Zephyr -o output.wav "Your text here"
+```
+
+If no voice is specified, the default voice "Kore" will be used.
+
+## Available Voices
+
+Gemini TTS supports 30+ different voices, each with unique characteristics:
+
+### Popular Voices
+
+- **Kore** - Firm and confident (default)
+- **Charon** - Informative and clear
+- **Puck** - Upbeat and energetic
+- **Zephyr** - Bright and cheerful
+- **Leda** - Youthful and energetic
+- **Aoede** - Breezy and natural
+
+### Complete Voice List
+
+- Kore, Charon, Puck, Fenrir, Aoede, Leda, Orus, Zephyr
+- Autonoe, Callirhoe, Despina, Erinome, Gacrux, Laomedeia
+- Pulcherrima, Sulafat, Vindemiatrix, Achernar, Achird
+- Algenib, Algieba, Alnilam, Enceladus, Iapetus, Rasalgethi
+- Sadachbia, Zubenelgenubi, Vega, Capella, Lyra
+
+### Listing Available Voices
+
+To see all available voices with descriptions:
+
+```bash
+# List all voices with characteristics
+fabric --list-gemini-voices
+
+# List voice names only (for shell completion)
+fabric --list-gemini-voices --shell-complete-list
+```
+
+## Rate Limits
+
+Google Gemini TTS has usage quotas that vary by plan:
+
+### Free Tier
+
+- **15 requests per day** per project per TTS model
+- Quota resets daily
+- Applies to all TTS models (e.g., `gemini-2.5-flash-preview-tts`)
+
+### Rate Limit Errors
+
+If you exceed your quota, you'll see an error like:
+
+```text
+Error 429: You exceeded your current quota, please check your plan and billing details
+```
+
+**Solutions:**
+
+- Wait for daily quota reset (typically at midnight UTC)
+- Upgrade to a paid plan for higher limits
+- Use TTS generation strategically for important content
+
+For current rate limits and pricing, visit: <https://ai.google.dev/gemini-api/docs/rate-limits>
+
+## Configuration
+
+### Command Line Options
+
+- `--voice <voice_name>` - Specify the TTS voice to use
+- `-o <filename.wav>` - Output audio file (required for TTS models)
+- `-m <tts_model>` - Specify a TTS-capable model (e.g., `gemini-2.0-flash-tts`)
+
+### YAML Configuration
+
+You can also set a default voice in your Fabric configuration file (`~/.config/fabric/config.yaml`):
+
+```yaml
+voice: "Charon"  # Set your preferred default voice
+```
+
+## Requirements
+
+- Valid Google Gemini API key configured in Fabric
+- TTS-capable Gemini model (models containing "tts" in the name)
+- Audio output must be specified with `-o filename.wav`
+
+## Troubleshooting
+
+### Common Issues
+
+#### Error: "TTS model requires audio output"
+
+- Solution: Always specify an output file with `-o filename.wav` when using TTS models
+
+#### Error: "Invalid voice 'X'"
+
+- Solution: Check that the voice name is spelled correctly and matches one of the supported voices listed above
+
+#### Error: "TTS generation failed"
+
+- Solution: Verify your Gemini API key is valid and you have sufficient quota
+
+### Getting Help
+
+For additional help with TTS features:
+
+```bash
+fabric --help
+```
+
+## Technical Details
+
+- **Audio Format**: WAV files with 24kHz sample rate, 16-bit depth, mono channel
+- **Language Support**: Automatic language detection for 24+ languages
+- **Model Requirements**: Models must contain "tts", "preview-tts", or "text-to-speech" in the name
+- **Voice Selection**: Uses Google's PrebuiltVoiceConfig system for consistent voice quality
+
+---
+
+For more information about Fabric, visit the [main documentation](../README.md).