PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

Implementing yolov4 target detection and tracking based on nao robot

Implementation of UNet on the Joey ML framework

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

A comprehensive and up-to-date developer education platform for Urbit.

A Flexible Generative Framework for Graph-based Semi-supervised Learning (NeurIPS 2019)

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Fast sparse deep learning on CPUs

A Gura parser implementation for Python

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

Pytorch implementation of Masked Auto-Encoder

Image to Image translation, image generataton, few shot learning

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Official pytorch code for "APP: Anytime Progressive Pruning"

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Analysis of rationale selection in neural rationale models

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Generate vibrant and detailed images using only text.