Merge pull request #1419 from DrewThomasson/main

sync from main
This commit is contained in:
ROBERT MCDOWELL
2025-12-26 06:18:49 -08:00
committed by GitHub

View File

@@ -102,10 +102,11 @@ https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921
## Features
- 📚 Splits eBook into chapters for organized audio.
- 🎙️ High-quality text-to-speech with [Coqui XTTSv2](https://huggingface.co/coqui/XTTS-v2) and [Fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) (and more).
- 🎙️ High-quality text-to-speech with [Xttsv2](https://huggingface.co/coqui/XTTS-v2), [Fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) and much more.
- 🗣️ Optional voice cloning with your own voice file.
- 🌍 Supports +1110 languages (English by default). [List of Supported languages](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
- 🖥️ Designed to run on 4GB RAM.
- 🗣️ Optional custom model with your own training model.
- 🌍 Supports 1158 languages. [List of Supported languages](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
- 🖥️ Designed to run on 2GB RAM 1GB VRAM Min.
## Supported Languages
@@ -121,11 +122,12 @@ https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921
## Hardware Requirements
- 2gb RAM minimum, 8GB recommended
- Virtualization enabled if running on windows (Docker only)
- CPU (intel, AMD, ARM)*
- GPU (CUDA, ROCm, XPU)
- MPS (Apple Silicon CPU)
- 2gb RAM min, 8GB recommended.
- 1GB VRAM min, 4GB recommended.
- Virtualization enabled if running on windows (Docker only).
- CPU (intel, AMD, ARM)*.
- GPU (CUDA, ROCm, XPU).
- MPS (Apple Silicon CPU).
*<i> Modern TTS are very slow on CPU</i>
@@ -179,13 +181,11 @@ to let the web page reconnect to the new connection socket.**
### Basic Usage
- **Linux/MacOS**:
```bash
./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \
--voice [path_to_voice_file] --language [language_code]
./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
```
- **Windows**
```bash
ebook2audiobook.cmd --headless --ebook <path_to_ebook_file>
--voice [path_to_voice_file] --language [language_code]
ebook2audiobook.cmd --headless --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
```
- **[--ebook]**: Path to your eBook file
@@ -199,13 +199,11 @@ to let the web page reconnect to the new connection socket.**
(must be a .zip file containing the mandatory model files. Example for XTTSv2: config.json, model.pth, vocab.json and ref.wav)
- **Linux/MacOS**
```bash
./ebook2audiobook.sh --headless --ebook <ebook_file_path> \
--language <language> --custom_model <custom_model_path>
./ebook2audiobook.sh --headless --ebook <ebook_file_path> --language <language> --custom_model <custom_model_path>
```
- **Windows**
```bash
ebook2audiobook.cmd --headless --ebook <ebook_file_path> \
--language <language> --custom_model <custom_model_path>
ebook2audiobook.cmd --headless --ebook <ebook_file_path> --language <language> --custom_model <custom_model_path>
```
<i>Note: the ref.wav of your custom model is always the voice selected for the conversion</i>
@@ -213,7 +211,6 @@ to let the web page reconnect to the new connection socket.**
which must contain (according to the tts engine) all the mandatory files<br>
(see ./lib/models.py).
### For Detailed Guide with list of all Parameters to use
- **Linux/MacOS**
```bash
@@ -374,8 +371,6 @@ one [pause] is a random between 0.8 to 1.6 seconds
### Docker
#### Steps to Run
1. **Clone the Repository**:
```bash
git clone https://github.com/DrewThomasson/ebook2audiobook.git