PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

Persian Bert For Long-Range Sequences

Pangu-Alpha for Transformers

Augmenty is an augmentation library based on spaCy for augmenting texts.

NLP codes implemented with Pytorch (w/o library such as huggingface)

2021搜狐校园文本匹配算法大赛baseline

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

ChatBotProyect - This is an unfinished project about a simple chatbot.

Treemap visualisation of Maya scene files

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

This library is testing the ethics of language models by using natural adversarial texts.

Utilize Korean BERT model in sentence-transformers library

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

Intent parsing and slot filling in PyTorch with seq2seq + attention

KoBART model on huggingface transformers

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

Perform sentiment analysis and keyword extraction on Craigslist listings

Basic Utilities for PyTorch Natural Language Processing (NLP)