DrewThomasson/ebook2audiobook
GitHub: DrewThomasson/ebook2audiobook
一款支持声音克隆和多语言的开源电子书转有声书工具,集成多种 TTS 引擎并支持 Docker 部署。
Stars: 19302 | Forks: 1605
# 📚 ebook2audiobook (E2A)
支持 CPU/GPU 的电子书转有声书转换器,带有章节和元数据
使用先进的 TTS 引擎及更多功能。
支持声音克隆和 1158 种语言! [](https://discord.gg/63Tv3F65k6) ### 本地运行 [](#instructions) [](https://github.com/DrewThomasson/ebook2audiobook/actions/workflows/Docker-Build.yml) [](https://github.com/DrewThomasson/ebook2audiobook/releases/latest)
### 远程运行
[](https://huggingface.co/spaces/drewThomasson/ebook2audiobook)
[](https://colab.research.google.com/github/DrewThomasson/ebook2audiobook/blob/main/Notebooks/colab_ebook2audiobook.ipynb) [](https://github.com/Rihcus/ebook2audiobookXTTS/blob/main/Notebooks/kaggle-ebook2audiobook.ipynb)
#### GUI 界面

## 演示
**全新默认语音演示**
https://github.com/user-attachments/assets/750035dc-e355-46f1-9286-05c1d9e88cea
## README.md
## 目录
- [ebook2audiobook](#-ebook2audiobook)
- [功能](#features)
- [GUI 界面](#gui-interface)
- [演示](#demos)
- [支持的语言](#supported-languages)
- [最低要求](#hardware-requirements)
- [用法](#instructions)
- [本地运行](#instructions)
- [启动 Gradio Web 界面](#instructions)
- [基础 Headless 用法](#basic-usage)
- [Headless 自定义 XTTS 模型用法](#example-of-custom-model-zip-upload)
- [帮助命令输出](#help-command-output)
- [远程运行](#run-remotely)
- [Docker](#docker)
- [运行步骤](#docker)
- [常见 Docker 问题](#common-docker-issues)
- [微调 TTS 模型](#fine-tuned-tts-models)
- [微调 TTS 模型集合](#fine-tuned-tts-collection)
- [训练 XTTSv2](#fine-tune-your-own-xttsv2-model)
- [支持的电子书格式](#supported-ebook-formats)
- [输出格式](#output-and-process-formats)
- [回退到旧版本](#reverting-to-older-versions)
- [常见问题](#common-issues)
- [特别感谢](#special-thanks)
- [目录](#table-of-contents)
## 功能
- 🔧 **支持的 TTS 引擎**:`XTTSv2`, `Bark`, `Fairseq`, `VITS`, `Tacotron2`, `Tortoise`, `GlowTTS`, `YourTTS`
- 📚 **转换多种文件格式**:`.epub`, `.mobi`, `.azw3`, `.fb2`, `.lrf`, `.rb`, `.snb`, `.tcr`, `.pdf`, `.txt`, `.rtf`, `.doc`, `.docx`, `.html`, `.odt`, `.azw`, `.tiff`, `.tif`, `.png`, `.jpg`, `.jpeg`, `.bmp`, `.zip`
- 💻 **文本区域** 直接将短文本转换为音频
- 🔍 **OCR 扫描** 用于包含文本页面图像的文件
- 🔊 **高质量文本转语音** 从接近实时到接近真实人声
- 🗣️ **可选声音克隆** 使用您自己的声音文件
- 🌐 **支持 1158 种语言** ([支持语言列表](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html))
- 💻 **对低资源环境友好** — 可在 **2 GB RAM / 1 GB VRAM (最低要求)** 上运行
- 🎵 **有声书输出格式**:单声道或立体声 `aac`, `flac`, `mp3`, `m4b`, `m4a`, `mp4`, `mov`, `ogg`, `wav`, `webm`
- 🧠 **支持 SML 标签** — 对停顿、暂停、声音切换等进行细粒度控制 ([见下文](#sml-tags-available))
- 🧩 **可选自定义模型** 使用您自己训练的模型 (XTTSv2, VITS, FAIRSEQ, PIPER,其他模型视需求提供)
- 🎛️ **由 E2A 团队训练的微调预设模型**
(如果您需要额外的微调模型,或者希望将您的模型分享到官方预设列表,请联系我们) ## 硬件要求 - 最低 2GB RAM,推荐 8GB。 - 最低 1GB VRAM,推荐 4GB。 - 如果在 Windows 上运行,需启用虚拟化 (仅限 Docker)。 - CPU, XPU (intel, AMD, ARM)*。 - CUDA, ROCm, JETSON - MPS (Apple Silicon CPU) * 现代 TTS 引擎在 CPU 上运行非常慢,因此请使用较低质量的 TTS,例如 YourTTS, Tacotron2 等。 ## 支持的语言 | **阿拉伯语** | **中文** | **英语** | **西班牙语** | |:------------------:|:------------------:|:------------------:|:------------------:| | **法语** | **德语** | **意大利语** | **葡萄牙语** | | **波兰语** | **土耳其语** | **俄语** | **荷兰语** | | **捷克语** | **日语** | **印地语** | **孟加拉语** | | **匈牙利语** | **韩语** | **越南语**| **瑞典语** | | **波斯语** | **约鲁巴语** | **斯瓦希里语** | **印尼语**| | **斯洛伐克语** | **克罗地亚语** | **泰米尔语** | **丹麦语** | - [**在此查看 +1130 种语言和方言**](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) ## 支持的电子书格式 - `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr` - **最佳效果**:`.epub` 或 `.mobi`,用于自动章节检测 ## 输出与处理格式 - `.m4b`, `.m4a`, `.mp4`, `.webm`, `.mov`, `.mp3`, `.flac`, `.wav`, `.ogg`, `.aac` - 处理格式可以在 lib/conf.py 中更改 ## 可用的 SML 标签 - `[break]` — 静音 (随机范围 **0.3–0.6 秒**) - `[pause]` — 静音 (随机范围 **1.0–1.6 秒**) - `[pause:N]` — 固定暂停 (**N 秒**) - `[voice:/path/to/voice/file]...[/voice]` — 从默认或通过 GUI/CLI 选择的语音切换声音 **查看我们的另一个仓库,专门用于自动在您的电子书中添加 SML -> [E2A-SML](./tools/E2A-SML)** ### 说明 1. **克隆仓库** git clone https://github.com/DrewThomasson/ebook2audiobook.git cd ebook2audiobook 2. **安装 / 运行 ebook2audiobook**: - **Linux/MacOS** ./ebook2audiobook.command MacOS 用户注意:会安装 homebrew 以安装缺失的程序。 - **Mac 启动器** 双击 `Mac Ebook2Audiobook Launcher.command` - **Windows** ebook2audiobook.cmd 或者 双击 `ebook2audiobook.cmd` Windows 用户注意:会安装 scoop 以在无需管理员权限的情况下安装缺失的程序。 3. **打开 Web 应用**:点击终端中提供的 URL 以访问 Web 应用并转换电子书。 `http://localhost:7860/` 4. **获取公共链接**: `./ebook2audiobook.command --share` (Linux/MacOS) `ebook2audiobook.cmd --share` (Windows) `python app.py --share` (所有操作系统) ### 基础用法 - **Linux/MacOS**: ./ebook2audiobook.command --headless --ebook --voice --language
- **Windows**
ebook2audiobook.cmd --headless --ebook --voice --language
- **[--ebook]**:您的电子书文件路径
- **[--voice]**:声音克隆文件路径 (可选)
- **[--language]**:ISO-639-3 语言代码 (例如:ita 代表意大利语,eng 代表英语,deu 代表德语...)。
默认语言为 eng,对于在 ./lib/lang.py 中设置的默认语言,--language 是可选的。
也支持 ISO-639-1 的 2 字母代码。 ### 自定义模型 Zip 上传示例 (必须是包含必需模型文件的 .zip 文件。例如 XTTSv2 的 config.json, model.pth, vocab.json 和 ref.wav) - **Linux/MacOS** ./ebook2audiobook.command --headless --ebook --language --custom_model
- **Windows**
ebook2audiobook.cmd --headless --ebook --language --custom_model
注意:自定义模型的 ref.wav 始终是转换时选择的声音
- ****:指向 `model_name.zip` 文件的路径,
该文件必须 (根据 tts 引擎) 包含所有必需的文件
(见 ./lib/models.py)。 ### 包含所有可用参数的详细指南 - **Linux/MacOS** ./ebook2audiobook.command --help - **Windows** ebook2audiobook.cmd --help - **或者适用于所有操作系统** ```python app.py --help ``` ``` usage: app.py [-h] [--session SESSION] [--share] [--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR] [--language LANGUAGE] [--voice VOICE] [--voice_map VOICE_MAP] [--device {CPU,CUDA,MPS,ROCM,XPU,JETSON}] [--tts_engine {XTTS,BARK,VITS,FAIRSEQ,TACOTRON,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}] [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED] [--output_format OUTPUT_FORMAT] [--output_channel OUTPUT_CHANNEL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting] [--text_temp TEXT_TEMP] [--waveform_temp WAVEFORM_TEMP] [--output_dir OUTPUT_DIR] [--version] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --session SESSION Session to resume the conversion in case of interruption, crash, or reuse of custom models and custom cloning voices. **** The following option is for gradio/gui mode only: --share (Optional) Enable a public shareable Gradio link. **** The following options are for --headless mode only: --headless Run the script in headless mode --ebook EBOOK Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present. --ebooks_dir EBOOKS_DIR Relative or absolute path of the directory containing the files to convert. Cannot be used when --ebook is present. --text TEXT Raw text for conversion. Cannot be used when --ebook or --ebooks_dir is present. --language LANGUAGE Language of the e-book. Default language is set in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py optional parameters: --translate ISO3 (Optional) Translate ebook to a target language (ISO 639-3 code, e.g. eng, fra, deu) before TTS synthesis. Uses argostranslate. The target language becomes the effective TTS language for the run. A copy of the source ebook is made with the _ suffix so translated and non-translated
outputs stay isolated (independent process folder, audio chunks, and final file).
--voice VOICE (Optional) Path to the voice cloning file for TTS engine.
Uses the default voice if not present.
--voice_map VOICE_MAP
(Optional, --ebooks_dir only) Path to a JSON file mapping ebook path -> voice path.
Each entry overrides --voice for that specific ebook. Missing/null entries fall back to --voice.
Keys may be absolute paths or basenames. Example:
{"book1.epub": "/voices/eng/adult/female/alice.wav", "/abs/path/book2.epub": null}
--device {CPU,CUDA,MPS,ROCM,XPU,JETSON}
(Optional) Processor unit type for the conversion.
Default is set in ./lib/conf.py if not present. Fall back to CPU if CUDA or MPS is not available.
--tts_engine {XTTS,BARK,VITS,FAIRSEQ,TACOTRON,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}
(Optional) Preferred TTS engine (available are: ['XTTS', 'BARK', 'VITS', 'FAIRSEQ', 'TACOTRON', 'YOURTTS', 'xtts', 'bark', 'vits', 'fairseq', 'tacotron', 'yourtts'].
Default depends on the selected language. The tts engine should be compatible with the chosen language
--custom_model CUSTOM_MODEL
(Optional) Path to the custom model zip file cntaining mandatory model files.
Please refer to ./lib/models.py
--fine_tuned FINE_TUNED
(Optional) Fine tuned model path. Default is builtin model.
--output_format OUTPUT_FORMAT
(Optional) Output audio format. Default is m4b set in ./lib/conf.py
--output_channel OUTPUT_CHANNEL
(Optional) Output audio channel. Default is mono set in ./lib/conf.py
--temperature TEMPERATURE
(xtts only, optional) Temperature for the model.
Default to config.json model. Higher temperatures lead to more creative outputs.
--length_penalty LENGTH_PENALTY
(xtts only, optional) A length penalty applied to the autoregressive decoder.
Default to config.json model. Not applied to custom models.
--num_beams NUM_BEAMS
(xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty.
Default to config.json model.
--repetition_penalty REPETITION_PENALTY
(xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself.
Default to config.json model.
--top_k TOP_K (xtts only, optional) Top-k sampling.
Lower values mean more likely outputs and increased audio generation speed.
Default to config.json model.
--top_p TOP_P (xtts only, optional) Top-p sampling.
Lower values mean more likely outputs and increased audio generation speed. Default to config.json model.
--speed SPEED (xtts only, optional) Speed factor for the speech generation.
Default to config.json model.
--enable_text_splitting
(xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient.
Default to config.json model.
--text_temp TEXT_TEMP
(bark only, optional) Text Temperature for the model.
Default to config.json model.
--waveform_temp WAVEFORM_TEMP
(bark only, optional) Waveform Temperature for the model.
Default to config.json model.
--output_dir OUTPUT_DIR
(Optional) Path to the output directory. Default is set in ./lib/conf.py
--version Show the version of the script and exit
Example usage:
Windows:
Gradio/GUI:
ebook2audiobook.cmd
Headless mode:
ebook2audiobook.cmd --headless --ebook '/path/to/file' --language eng
Linux/Mac:
Gradio/GUI:
./ebook2audiobook.command
Headless mode:
./ebook2audiobook.command --headless --ebook '/path/to/file' --language eng
SML tags available:
[break] — silence (random range **0.3–0.6 sec.**)
[pause] — silence (random range **1.0–1.6 sec.**)
[pause:N] — fixed pause (**N sec.**)
[voice:/path/to/voice/file]...[/voice] — switch voice from default or selected voice from GUI/CLI
```
注意:在 gradio/gui 模式下,要取消正在进行的转换,只需点击电子书上传组件上的 [X]。
提示:如果需要更长的停顿,可以添加 '[pause:3]' 代表 3 秒等。
### Docker
1. **克隆仓库**:
```
git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook
```
2. **构建容器**
```
Windows:
Docker:
ebook2audiobook.cmd --script_mode build_docker
Docker Compose:
ebook2audiobook.cmd --script_mode build_docker --docker_mode compose
Podman Compose:
ebook2audiobook.cmd --script_mode build_docker --docker_mode podman
Linux/Mac
Docker:
./ebook2audiobook.command --script_mode build_docker
Docker Compose
./ebook2audiobook.command --script_mode build_docker --docker_mode compose
Podman Compose:
./ebook2audiobook.command --script_mode build_docker --docker_mode podman
```
4. **运行容器:**
```
Docker run image:
Gradio/GUI:
CPU:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" --rm -it -p 7860:7860 athomasson2/ebook2audiobook:cpu
CUDA:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" --gpus all --rm -it -p 7860:7860 athomasson2/ebook2audiobook:cu[118/122/124/126 etc..]
ROCM:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" --device=/dev/kfd --device=/dev/dri --rm -it -p 7860:7860 athomasson2/ebook2audiobook:rocm[6.0/6.1/6.4 etc..]
XPU:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" --device=/dev/dri --rm -it -p 7860:7860 athomasson2/ebook2audiobook:xpu
JETSON:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" --runtime nvidia --rm -it -p 7860:7860 athomasson2/ebook2audiobook:jetson[51/60/61 etc...]
Headless mode:
CPU:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" -v "/my/real/ebooks/folder/absolute/path:/app/another_ebook_folder" --rm -it -p 7860:7860 ebook2audiobook:cpu --headless --ebook "/app/another_ebook_folder/myfile.pdf" [--voice /app/my/voicepath/voice.mp3 etc..]
CUDA:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" -v "/my/real/ebooks/folder/absolute/path:/app/another_ebook_folder" --gpus all --rm -it -p 7860:7860 ebook2audiobook:cu[118/122/124/126 etc..] --headless --ebook "/app/another_ebook_folder/myfile.pdf" [--voice /app/my/voicepath/voice.mp3 etc..]
ROCM:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" -v "/my/real/ebooks/folder/absolute/path:/app/another_ebook_folder" --device=/dev/kfd --device=/dev/dri --rm -it -p 7860:7860 ebook2audiobook:rocm[6.0/6.1/6.4 etc.] --headless --ebook "/app/another_ebook_folder/myfile.pdf" [--voice /app/my/voicepath/voice.mp3 etc..]
XPU:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" -v "/my/real/ebooks/folder/absolute/path:/app/another_ebook_folder" --device=/dev/dri --rm -it -p 7860:7860 ebook2audiobook:xpu --headless --ebook "/app/another_ebook_folder/myfile.pdf" [--voice /app/my/voicepath/voice.mp3 etc..]
JETSON:
docker run -v "./ebooks:/app/ebooks" -v "./audiobooks:/app/audiobooks" -v "./models:/app/models" -v "./voices:/app/voices" -v "./tmp:/app/tmp" -v "/my/real/ebooks/folder/absolute/path:/app/another_ebook_folder" --runtime nvidia --rm -it -p 7860:7860 ebook2audiobook:jetson[51/60/61 etc.] --headless --ebook "/app/another_ebook_folder/myfile.pdf" [--voice /app/my/voicepath/voice.mp3 etc..]
Docker Compose (i.e. cuda 12.8:
Run Gradio GUI:
DEVICE_TAG=cu128 docker compose --profile gpu up --no-log-prefix
Run Headless mode:
DEVICE_TAG=cu128 docker compose --profile gpu run --rm ebook2audiobook --headless --ebook "/app/ebooks/myfile.pdf" --voice /app/voices/eng/adult/female/some_voice.wav etc..
Podman Compose (i.e. cuda 12.8:
Run Gradio GUI:
DEVICE_TAG=cu128 podman-compose -f podman-compose.yml --profile gpu up
Run Headless mode:
DEVICE_TAG=cu128 podman-compose -f podman-compose.yml --profile gpu run --rm ebook2audiobook-gpu --headless --ebook "/app/ebooks/myfile.pdf" --voice /app/voices/eng/adult/female/some_voice.wav etc..
```
- 注意:Docker 中不暴露 MPS,因此必须使用 CPU
### 常见 Docker 问题
- 没有检测到我的 NVIDIA GPU?? -> [GPU 问题 Wiki 页面](https://github.com/DrewThomasson/ebook2audiobook/wiki/GPU-ISSUES)
## 微调 TTS 模型
#### 微调您自己的 XTTSv2 模型
[](https://huggingface.co/spaces/drewThomasson/xtts-finetune-webui-gpu) [](https://github.com/DrewThomasson/ebook2audiobook/blob/v25/Notebooks/finetune/xtts/kaggle-xtts-finetune-webui-gradio-gui.ipynb) [](https://colab.research.google.com/github/DrewThomasson/ebook2audiobook/blob/v25/Notebooks/finetune/xtts/colab_xtts_finetune_webui.ipynb)
#### 对训练数据进行降噪
[](https://huggingface.co/spaces/drewThomasson/DeepFilterNet2_no_limit) [](https://github.com/Rikorose/DeepFilterNet)
### 微调 TTS 集合
[](https://huggingface.co/drewThomasson/fineTunedTTSModels/tree/main)
对于 XTTSv2 自定义模型,必须提供该声音参考的音频片段:
## 您的 Ebook2Audiobook 自定义配置
您可以自由修改 libs/conf.py 以添加或删除您想要的设置。如果您计划这样做,只需复制
原始的 conf.py,这样在每次 ebook2audiobook 更新时,您就可以备份修改后的 conf.py 并放
回原始文件。您必须为 models.py 计划相同的过程。如果您希望将您自己的自定义模型
作为官方的 ebook2audiobook 微调模型,请与我们联系,我们会将其添加到预设列表中。
## 回退到旧版本
可以在 -> [这里](https://github.com/DrewThomasson/ebook2audiobook/releases) 找到发布版本
```
git checkout tags/VERSION_NUM # Locally/Compose -> Example: git checkout tags/v25.7.7
```
## 常见问题:
- 没有检测到我的 NVIDIA/ROCm/XPU/MPS GPU?? -> [GPU 问题 Wiki 页面](https://github.com/DrewThomasson/ebook2audiobook/wiki/GPU-ISSUES)
- CPU 运行缓慢 (在服务器多核 CPU 上更好),而 GPU 几乎可以实现实时转换。
[相关讨论](https://github.com/DrewThomasson/ebook2audiobook/discussions/19#discussioncomment-10879846)
(不过它没有零样本声音克隆功能,且是 Siri 质量的声音,但在 cpu 上要快得多)。
- "我遇到了依赖项问题" - 只需使用 docker,它是完全自包含的并且具有 headless 模式,
在 docker run 命令末尾添加 `--help` 参数以获取更多信息。
- "我遇到了音频截断问题!" - 请为此提交一个 Issue,
我们无法阅读所有语言,需要用户的建议来微调句子分割逻辑。😊
## ***** 路线图 *****
- 所有功能均向公众贡献开放 ⭐
- 欢迎任何使用支持语言的人帮助我们改进模型 ⭐
- [x] 在开始转换前预览区块/章节
- [ ] 按转换后的句子进行编辑,用于精准的文本更改
- [x] SML 标签集成,用于声音、暂停、中断等更多更改
- [x] 不同语言的 -h -help 参数信息
- [x] 用于 PDF / JPG / BMP / PNG / TIFF 的 OCR 扫描
- [x] Notebooks 文件夹 [在此讨论](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/5#issuecomment-2408773254)
- [x] 使中文文本分割不拆分词语并改善停顿时间 [在此讨论](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/18#issuecomment-2401154894)
- [x] Dockerfile
- [x] Docker compose
- [x] Podman compose
- [x] Kaggle Notebook
- [x] Google Colab Notebook
- [ ] [制作 iOS 应用](https://github.com/DrewThomasson/ebook2audiobook/pull/35#issuecomment-2496495212)
- [ ] [制作 Android 应用](https://github.com/DrewThomasson/ebook2audiobook/pull/35#issuecomment-2496495212)
- [ ] Audiobookshelf 集成
#### 额外选项
- [x] 电子书翻译选项
- [x] 输出格式选择
- [x] 批量电子书文件夹
- [x] 多进程转换
- [x] 批量电子书文件夹转换
- [x] GPU 设备检测
- [x] 为上传的声音克隆降噪任何参考音频
- [x] 自定义模型上传 (目前仅支持 XTTSv2。更多模型视需求提供)
- [ ] 至少为 xttsv2, fairseq, vits, piper 添加欧洲葡萄牙语语言模型 (欢迎协助)
- [ ] 至少为 xttsv2, fairseq, vits, piper 添加信德语语言模型 (欢迎协助)
#### TTS 引擎
- [x] XTTSv2
- [x] Bark
- [x] Fairseq
- [x] VITS
- [x] Tacotron2
- [x] YourTTS
- [x] Tortoise
- [x] GlowTTS
- [x] Piper
- [ ] GPT-SoVITS (https://github.com/RVC-Boss/GPT-SoVITS)
- [ ] OpenVoice (https://github.com/myshell-ai/OpenVoice)
- [ ] fish-speech (https://github.com/fishaudio/fish-speech)
- [ ] ChatTTS (https://github.com/2noise/ChatTTS)
- [ ] CosyVoice (https://github.com/FunAudioLLM/CosyVoice)
- [ ] F5-TTS (https://github.com/swivid/f5-tts)
- [ ] chatterbox (https://github.com/resemble-ai/chatterbox)
- [ ] Supertonic (https://github.com/supertone-inc/supertonic)
- [ ] Spark-TTS (https://github.com/sparkaudio/spark-tts)
- [ ] index-tts (https://github.com/index-tts/index-tts)
- [ ] MeloTTS (https://github.com/myshell-ai/MeloTTS)
- [ ] Kokoro-TTS (https://github.com/hexgrad/kokoro)
- [ ] OmniVoice (https://github.com/k2-fsa/OmniVoice)
- [ ] Zonos (https://github.com/Zyphra/Zonos)
- [ ] Style-TTS2 (https://github.com/yl4579/StyleTTS2)
- [ ] Orpheus-TTS (https://github.com/canopyai/Orpheus-TTS)
- [ ] NewTTS (https://github.com/neuphonic/neutts?tab=readme-ov-file)
- [ ] VIbeVoice (https://github.com/vibevoice-community/VibeVoice)
- [ ] Qwen3-TTS (https://huggingface.co/spaces/Qwen/Qwen3-TTS)
#### Readme 翻译
- [x] 阿拉伯语
- [x] 中文
- [x] 英语
- [x] 西班牙语
- [x] 法语
- [x] 德语
- [x] 意大利语
- [x] 葡萄牙语
- [x] 波兰语
- [x] 土耳其语
- [x] 俄语
- [x] 荷兰语
- [x] 捷克语
- [x] 日语
- [x] 印地语
- [x] 孟加拉语
- [x] 匈牙利语
- [x] 韩语
- [x] 越南语
- [x] 瑞典语
- [x] 波斯语
- [x] 约鲁巴语
- [x] 斯瓦希里语
- [x] 印尼语
- [x] 斯洛伐克语
- [x] 克罗地亚语
#### 🐍 操作系统兼容性
- [x] 🍎 Mac Intel x86
- [x] 🪟 Windows x86
- [x] 🐧 Linux x86
- [x] 🖥️🍏 Apple Silicon Mac
- [x] 🪟💪 ARM Windows
- [x] 🐧💪 ARM Linux
## 用于训练模型等的额外终极方案 (通过一个简单的命令支持所有 Coqui-tts 模型和 piper-tts)
- 有关此信息请联系 @DrewThomasson,他目前正在开发此功能,[在此查看进行中的仓库](https://github.com/DrewThomasson/Universal_TTS_Finetune)
- [ ] 为 ljspeech 格式的训练配方中的所有 coqui-tts 模型制作一个易于使用的训练图形界面 [来自 coqui tts 的指南](https://github.com/coqui-ai/TTS/tree/dev/recipes/ljspeech)
## 贡献者的 Python 代码规范信息
- 代码之间不留空行,除非是在函数和类之间。
- 所有的键都使用单引号,除非是 dict() 和 json。dict['key'] 调用时始终使用单引号
- 使用 4 个空格缩进,完全不使用 tab
- 所有函数及其参数声明和返回值实行严格类型检查
- 参数与其类型之间没有空格,函数、`->` 和返回值之间也没有空格
示例:
```
import json
from typing import Optional
def get_user(user_id:int, users:list[dict])->Optional[dict]:
for user in users:
if user['id'] == user_id:
return user
return None
def summarize(user:dict)->str:
return f"User {user['name']} is {'active' if user['is_active'] else 'inactive'}."
def to_json(user:dict)->str:
return json.dumps({"id": user['id'], "name": user['name'], "email": user['email']})
users:list = [
dict(id=1, name="alice", email="alice@example.com", role="admin", is_active=True),
dict(id=2, name="bob", email="bob@example.com", role="editor", is_active=False),
dict(id=3, name="carol", email="carol@example.com", role="viewer", is_active=True),
]
config = {
"max_users": 100,
"default_role": "viewer",
"allow_signup": True,
}
roles = ['admin', 'editor', 'viewer']
found = get_user(1, users)
if found:
print(summarize(found))
print(found['email'])
print(to_json(found))
if config['default_role'] in roles:
print(config['default_role'])
```
## 寻求用于 Beta 测试的硬件捐赠
我们接受任何类型的硬件来测试我们的开发,例如:
- 支持 cuda >= 11.8 的 Nvidia
- XPU intel 显卡
- 支持 ROCm >=5.7 的 ROCm AMD 显卡
@DrewThomasson 如果您想提供任何帮助! 😃
使用先进的 TTS 引擎及更多功能。
支持声音克隆和 1158 种语言! [](https://discord.gg/63Tv3F65k6) ### 本地运行 [](#instructions) [](https://github.com/DrewThomasson/ebook2audiobook/actions/workflows/Docker-Build.yml) [](https://github.com/DrewThomasson/ebook2audiobook/releases/latest)
Click to see images of Web GUI
更多演示
**ASMR 语音** https://github.com/user-attachments/assets/68eee9a1-6f71-4903-aacd-47397e47e422 **雨天语音** https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080 **Scarlett 语音** https://github.com/user-attachments/assets/b12009ee-ec0d-45ce-a1ef-b3a52b9f8693 **David Attenborough 语音** https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921 **示例** (如果您需要额外的微调模型,或者希望将您的模型分享到官方预设列表,请联系我们) ## 硬件要求 - 最低 2GB RAM,推荐 8GB。 - 最低 1GB VRAM,推荐 4GB。 - 如果在 Windows 上运行,需启用虚拟化 (仅限 Docker)。 - CPU, XPU (intel, AMD, ARM)*。 - CUDA, ROCm, JETSON - MPS (Apple Silicon CPU) * 现代 TTS 引擎在 CPU 上运行非常慢,因此请使用较低质量的 TTS,例如 YourTTS, Tacotron2 等。 ## 支持的语言 | **阿拉伯语** | **中文** | **英语** | **西班牙语** | |:------------------:|:------------------:|:------------------:|:------------------:| | **法语** | **德语** | **意大利语** | **葡萄牙语** | | **波兰语** | **土耳其语** | **俄语** | **荷兰语** | | **捷克语** | **日语** | **印地语** | **孟加拉语** | | **匈牙利语** | **韩语** | **越南语**| **瑞典语** | | **波斯语** | **约鲁巴语** | **斯瓦希里语** | **印尼语**| | **斯洛伐克语** | **克罗地亚语** | **泰米尔语** | **丹麦语** | - [**在此查看 +1130 种语言和方言**](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) ## 支持的电子书格式 - `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr` - **最佳效果**:`.epub` 或 `.mobi`,用于自动章节检测 ## 输出与处理格式 - `.m4b`, `.m4a`, `.mp4`, `.webm`, `.mov`, `.mp3`, `.flac`, `.wav`, `.ogg`, `.aac` - 处理格式可以在 lib/conf.py 中更改 ## 可用的 SML 标签 - `[break]` — 静音 (随机范围 **0.3–0.6 秒**) - `[pause]` — 静音 (随机范围 **1.0–1.6 秒**) - `[pause:N]` — 固定暂停 (**N 秒**) - `[voice:/path/to/voice/file]...[/voice]` — 从默认或通过 GUI/CLI 选择的语音切换声音 **查看我们的另一个仓库,专门用于自动在您的电子书中添加 SML -> [E2A-SML](./tools/E2A-SML)** ### 说明 1. **克隆仓库** git clone https://github.com/DrewThomasson/ebook2audiobook.git cd ebook2audiobook 2. **安装 / 运行 ebook2audiobook**: - **Linux/MacOS** ./ebook2audiobook.command MacOS 用户注意:会安装 homebrew 以安装缺失的程序。 - **Mac 启动器** 双击 `Mac Ebook2Audiobook Launcher.command` - **Windows** ebook2audiobook.cmd 或者 双击 `ebook2audiobook.cmd` Windows 用户注意:会安装 scoop 以在无需管理员权限的情况下安装缺失的程序。 3. **打开 Web 应用**:点击终端中提供的 URL 以访问 Web 应用并转换电子书。 `http://localhost:7860/` 4. **获取公共链接**: `./ebook2audiobook.command --share` (Linux/MacOS) `ebook2audiobook.cmd --share` (Windows) `python app.py --share` (所有操作系统) ### 基础用法 - **Linux/MacOS**: ./ebook2audiobook.command --headless --ebook
默认语言为 eng,对于在 ./lib/lang.py 中设置的默认语言,--language 是可选的。
也支持 ISO-639-1 的 2 字母代码。 ### 自定义模型 Zip 上传示例 (必须是包含必需模型文件的 .zip 文件。例如 XTTSv2 的 config.json, model.pth, vocab.json 和 ref.wav) - **Linux/MacOS** ./ebook2audiobook.command --headless --ebook
(见 ./lib/models.py)。 ### 包含所有可用参数的详细指南 - **Linux/MacOS** ./ebook2audiobook.command --help - **Windows** ebook2audiobook.cmd --help - **或者适用于所有操作系统** ```python app.py --help ``` ``` usage: app.py [-h] [--session SESSION] [--share] [--headless] [--ebook EBOOK] [--ebooks_dir EBOOKS_DIR] [--language LANGUAGE] [--voice VOICE] [--voice_map VOICE_MAP] [--device {CPU,CUDA,MPS,ROCM,XPU,JETSON}] [--tts_engine {XTTS,BARK,VITS,FAIRSEQ,TACOTRON,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}] [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED] [--output_format OUTPUT_FORMAT] [--output_channel OUTPUT_CHANNEL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting] [--text_temp TEXT_TEMP] [--waveform_temp WAVEFORM_TEMP] [--output_dir OUTPUT_DIR] [--version] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --session SESSION Session to resume the conversion in case of interruption, crash, or reuse of custom models and custom cloning voices. **** The following option is for gradio/gui mode only: --share (Optional) Enable a public shareable Gradio link. **** The following options are for --headless mode only: --headless Run the script in headless mode --ebook EBOOK Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present. --ebooks_dir EBOOKS_DIR Relative or absolute path of the directory containing the files to convert. Cannot be used when --ebook is present. --text TEXT Raw text for conversion. Cannot be used when --ebook or --ebooks_dir is present. --language LANGUAGE Language of the e-book. Default language is set in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py optional parameters: --translate ISO3 (Optional) Translate ebook to a target language (ISO 639-3 code, e.g. eng, fra, deu) before TTS synthesis. Uses argostranslate. The target language becomes the effective TTS language for the run. A copy of the source ebook is made with the _