Pytorch implementation of Tacotron

Last update: Dec 02, 2022

Overview

Tacotron-pytorch

A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Requirements

Install python 3
Install pytorch == 0.2.0
Install requirements:
```
pip install -r requirements.txt
```

Data

I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.

Training the network

STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.

Generate TTS wav file

STEP 1. Run synthesis.py. Make sure the restore step.

Samples

You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.

Reference

Keith ito: https://github.com/keithito/tacotron

Comments

Any comments for the codes are always welcome.

Pytorch implementation of Tacotron

Related tags

Overview

Tacotron-pytorch

Requirements

Data

File description

Training the network

Generate TTS wav file

Samples

Reference

Comments

Owner

soobin seo

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

Utilizing RBERT model for KLUE Relation Extraction task

Text preprocessing, representation and visualization from zero to hero.

⚖️ A Statutory Article Retrieval Dataset in French.

Basic yet complete Machine Learning pipeline for NLP tasks

NVDA, the free and open source Screen Reader for Microsoft Windows

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

A library for Multilingual Unsupervised or Supervised word Embeddings

DiY Oxygen Concentrator based on the OxiKit

Korea Spell Checker

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Precision Medicine Knowledge Graph (PrimeKG)

Lingtrain Aligner — ML powered library for the accurate texts alignment.

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

UniSpeech - Large Scale Self-Supervised Learning for Speech

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Comprehensive-E2E-TTS - PyTorch Implementation

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.