CorentinJ/Real-Time-Voice-Cloning

GitHub: CorentinJ/Real-Time-Voice-Cloning

基于深度学习的开源实时语音克隆工具，仅需5秒音频样本即可克隆任意声音并生成语音。

Stars: 59483 | Forks: 9419

# 实时语音克隆本仓库是 [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf) (SV2TTS) 的一个实现，带有一个可实时运行的 vocoder。这是我的 [硕士论文](https://matheo.uliege.be/handle/2268.2/6801)。 SV2TTS 是一个三阶段的深度学习框架。在第一阶段，通过几秒钟的音频创建声音的数字化表示。在第二和第三阶段，该表示被用作参考，以生成给定任意文本的语音。 **视频演示** (点击图片): [![Toolbox demo](https://i.imgur.com/8lFUlgz.png)](https://www.youtube.com/watch?v=-O_hYhToKoA) ### 实现的论文 | URL | Designation | Title | Implementation source | | ------------------------------------------------------ | ---------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------- | | [**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | 本仓库 | | [1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | | [1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | | [1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder) | Generalized End-To-End Loss for Speaker Verification | 本仓库 | ## 提示像深度学习领域的其他事物一样，这个仓库很快就会过时。许多 SaaS 应用（通常是付费的）会提供比本仓库更好的音质。如果你希望寻找具有高音质的开源解决方案： - 查看 [paperswithcode](https://paperswithcode.com/task/speech-synthesis/) 以获取语音合成领域的其他仓库和最新研究。 - 查看 [Chatterbox](https://github.com/resemble-ai/chatterbox)，这是一个与 2025 年 SOTA 语音克隆技术同步的类似项目 ## 运行工具箱 Windows 和 Linux 均受支持。 1. 安装 [ffmpeg](https://ffmpeg.org/download.html#get-packages)。这对于读取音频文件是必须的。通过在命令行中运行以下命令来检查是否已安装 ``` ffmpeg ``` 2. 安装 uv 用于 python 包管理 ``` # 在 Windows 上： powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # 在 Linux 上 curl -LsSf https://astral.sh/uv/install.sh | sh # 或者，在任何平台上，如果你安装了 pip，可以执行 pip install -U uv ``` 3. 运行以下命令之一 ``` # 如果你有 NVIDIA GPU，运行 toolbox uv run --extra cuda demo_toolbox.py # 如果你没有，使用这个 uv run --extra cpu demo_toolbox.py # 如果你不想要 GUI，在命令行中运行 uv run --extra cuda demo_cli.py uv run --extra cpu demo_cli.py ``` Uv 将自动为你创建一个 .venv 目录，其中包含适当的 python 环境。如果失败，请[提交 issue](https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues) ### (可选) 下载预训练模型预训练模型现在会自动下载。如果这对你不起作用，你可以从 [Hugging Face](https://huggingface.co/CorentinJ/SV2TTS/tree/main) 手动下载。 ### (可选) 下载数据集如果只是想体验工具箱，我只建议下载 [`LibriSpeech/train-clean-100`](https://www.openslr.org/resources/12/train-clean-100.tar.gz)。将内容解压为 `/LibriSpeech/train-clean-100`，其中 `` 是你选择的目录。工具箱支持其他数据集，请参见[此处](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets)。你可以不下载任何数据集，但那样你将需要自己的数据作为音频文件，或者使用工具箱进行录制。

标签：AI换声, Apex, Python, Speaker Verification, SV2TTS, Tacotron, TTS, WaveRNN, 人工智能, 凭据扫描, 声音复刻, 声音模拟, 实时语音合成, 数字信号处理, 文本转语音, 无后门, 机器学习, 深度学习, 用户模式Hook绕过, 神经网络, 自动语音识别, 语音克隆, 迁移学习, 逆向工具, 音频处理