A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

Overview

简体中文 | English

并行语音合成

[TOC]

新进展

目录结构

.
|--- config/      # 配置文件
     |--- default.yaml
     |--- ...
|--- datasets/    # 数据处理
|--- encoder/     # 声纹编码器
     |--- voice_encoder.py
     |--- ...
|--- helpers/     # 一些辅助类
     |--- trainer.py
     |--- synthesizer.py
     |--- ...
|--- logdir/      # 训练过程保存目录
|--- losses/      # 一些损失函数
|--- models/      # 合成模型
     |--- layers.py
     |--- duration.py
     |--- parallel.py
|--- pretrained/  # 预训练模型(LJSpeech 数据集)
|--- samples/     # 合成样例
|--- utils/       # 一些通用方法
|--- vocoder/     # 声码器
     |--- melgan.py
     |--- ...
|--- wandb/       # Wandb 保存目录
|--- extract-duration.py
|--- extract-embedding.py
|--- LICENSE
|--- prepare-dataset.py  # 准备脚本
|--- README.md
|--- README_en.md
|--- requirements.txt    # 依赖文件
|--- synthesize.py       # 合成脚本
|--- train-duration.py   # 训练脚本
|--- train-parallel.py

合成样例

部分合成样例见这里

预训练

部分预训练模型见这里

快速开始

步骤(1):克隆仓库

$ git clone https://github.com/atomicoo/ParallelTTS.git

步骤(2):安装依赖

$ conda create -n ParallelTTS python=3.7.9
$ conda activate ParallelTTS
$ pip install -r requirements.txt

步骤(3):合成语音

$ python synthesize.py \
  --checkpoint ./pretrained/ljspeech-parallel-epoch0100.pth \
  --melgan_checkpoint ./pretrained/ljspeech-melgan-epoch3200.pth \
  --input_texts ./samples/english/synthesize.txt \
  --outputs_dir ./outputs/

如果要合成其他语种的语音,需要通过 --config 指定相应的配置文件。

如何训练

步骤(1):准备数据

$ python prepare-dataset.py

通过 --config 可以指定配置文件,默认的 default.yaml 针对 LJSpeech 数据集。

步骤(2):训练对齐模型

$ python train-duration.py

步骤(3):提取持续时间

$ python extract-duration.py

通过 --ground_truth 可以指定是否利用对齐模型生成 Ground-Truth 声谱图。

步骤(4):训练合成模型

$ python train-parallel.py

通过 --ground_truth 可以指定是否使用 Ground-Truth 声谱图进行模型训练。

训练日志

如果使用 TensorBoardX,则运行如下命令:

$ tensorboard --logdir logdir/[DIR]/

强烈推荐使用 Wandb(Weights & Biases),只需在上述训练命令中增加 --enable_wandb 选项。

数据集

  • LJSpeech:英语,女性,22050 Hz,约 24 小时
  • LibriSpeech:英语,多说话人(仅使用 train-clean-100 部分),16000 Hz,总计约 1000 小时
  • JSUT:日语,女性,48000 Hz,约 10 小时
  • BiaoBei:普通话,女性,48000 Hz,约 12 小时
  • KSS:韩语,女性,44100 Hz,约 12 小时
  • RuLS:俄语,多说话人(仅使用单一说话人音频),16000 Hz,总计约 98 小时
  • TWLSpeech(非公开,质量较差):藏语,女性(多说话人,音色相近),16000 Hz,约 23 小时

质量评估

TODO:待补充

速度指标

训练速度:对于 LJSpeech 数据集,设置批次尺寸为 64,可以在单张 8GB 显存的 GTX 1080 显卡上进行训练,训练 ~8h(~300 epochs)后即可合成质量较高的语音。

合成速度:以下测试在 CPU @ Intel Core i7-8550U / GPU @ NVIDIA GeForce MX150 下进行,每段合成音频在 8 秒左右(约 20 词)

批次尺寸 Spec
(GPU)
Audio
(GPU)
Spec
(CPU)
Audio
(CPU)
1 0.042 0.218 0.100 2.004
2 0.046 0.453 0.209 3.922
4 0.053 0.863 0.407 7.897
8 0.062 2.386 0.878 14.599

注意,没有进行多次测试取平均值,结果仅供参考。

一些问题

  • wavegan 分支中,vocoder 代码取自 ParallelWaveGAN,由于声学特征提取方式不兼容,需要进行转化,具体转化代码见这里
  • 普通话模型的文本输入选择拼音序列,因为 BiaoBei 的原始拼音序列不包含标点、以及对齐模型训练不完全,所以合成语音的节奏会有点问题。
  • 韩语模型没有专门训练对应的声码器,而是直接使用 LJSpeech(同为 22050 Hz)的声码器,可能稍微影响合成语音的质量。

参考资料

TODO

  • 合成语音质量评估(MOS)
  • 更多不同语种的测试
  • 语音风格迁移(音色)

欢迎交流

  • 微信号:Joee1995

  • 企鹅号:793071559

Owner
Atomicoo
Atomicoo
A simple Streamlit App to classify swahili news into different categories.

Swahili News Classifier Streamlit App A simple app to classify swahili news into different categories. Installation Install all streamlit requirements

Davis David 4 May 01, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

Adobe, Inc. 148 Dec 26, 2022
HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

William Rowell 2 May 14, 2022
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Structured Super Lottery Tickets in BERT This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compress

Chen Liang 16 Dec 11, 2022
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

RuCLIPtiny Zero-shot image classification model for Russian language RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network

Shahmatov Arseniy 26 Sep 20, 2022
使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SimCSE复现 项目描述 SimCSE是一种简单但是很巧妙的NLP对比学习方法,创新性地引入Dropout的方式,对样本添加噪声,从而达到对正样本增强的目的。 该框架的训练目的为:对于batch中的每个样本,拉近其与正样本之间的距离,拉远其与负样本之间的距离,使得模型能够在大规模无监督语料(也可以

58 Dec 20, 2022
Generate vector graphics from a textual caption

VectorAscent: Generate vector graphics from a textual description Example "a painting of an evergreen tree" python text_to_painting.py --prompt "a pai

Ajay Jain 97 Dec 15, 2022
Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

Vishnu Nandakumar 13 Dec 21, 2022
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. Tab

HUSEIN ZOLKEPLI 1.7k Dec 30, 2022
profile tools for pytorch nn models

nnprof Introduction nnprof is a profile tool for pytorch neural networks. Features multi profile mode: nnprof support 4 profile mode: Layer level, Ope

Feng Wang 42 Jul 09, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

Microsoft 394 Dec 17, 2022
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my the

Corentin Jemine 38.5k Jan 03, 2023
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

Rohit Govindan 1 Jan 27, 2022