Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Overview

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

This repository is the official implementation of "Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech".

multi-task learning meta learning

Meta-TTS

image

Requirements

This is how I build my environment, which is not exactly needed to be the same:

  • Sign up for Comet.ml, find out your workspace and API key via www.comet.ml/api/my/settings and fill them in config/comet.py. Comet logger is used throughout train/val/test stages.
    • Check my training logs here.
  • [Optional] Install pyenv for Python version control, change to Python 3.8.6.
# After download and install pyenv:
pyenv install 3.8.6
pyenv local 3.8.6
  • [Optional] Install pyenv-virtualenv as a plugin of pyenv for clean virtual environment.
# After install pyenv-virtualenv
pyenv virtualenv meta-tts
pyenv activate meta-tts
# Install Cython first:
pip install cython

# Then install learn2learn from source:
git clone https://github.com/learnables/learn2learn.git
cd learn2learn
pip install -e .
  • Install requirements:
pip install -r requirements.txt

Proprocessing

First, download LibriTTS and VCTK, then change the paths in config/LibriTTS/preprocess.yaml and config/VCTK/preprocess.yaml, then run

python3 prepare_align.py config/LibriTTS/preprocess.yaml
python3 prepare_align.py config/VCTK/preprocess.yaml

for some preparations.

Alignments of LibriTTS is provided here, and the alignments of VCTK is provided here. You have to unzip the files into preprocessed_data/LibriTTS/TextGrid/ and preprocessed_data/VCTK/TextGrid/.

Then run the preprocessing script:

python3 preprocess.py config/LibriTTS/preprocess.yaml

# Copy stats from LibriTTS to VCTK to keep pitch/energy normalization the same shift and bias.
cp preprocessed_data/LibriTTS/stats.json preprocessed_data/VCTK/

python3 preprocess.py config/VCTK/preprocess.yaml

Training

To train the models in the paper, run this command:

python3 main.py -s train \
                -p config/preprocess/<corpus>.yaml \
                -m config/model/base.yaml \
                -t config/train/base.yaml config/train/<corpus>.yaml \
                -a config/algorithm/<algorithm>.yaml

To reproduce, please use 8 V100 GPUs for meta models, and 1 V100 GPU for baseline models, or else you might need to tune gradient accumulation step (grad_acc_step) setting in config/train/base.yaml to get the correct meta batch size. Note that each GPU has its own random seed, so even the meta batch size is the same, different number of GPUs is equivalent to different random seed.

After training, you can find your checkpoints under output/ckpt/ / / /checkpoints/ , where the project name is set in config/comet.py.

To inference the models, run:

python3 main.py -s test \
                -p config/preprocess/<corpus>.yaml \
                -m config/model/base.yaml \
                -t config/train/base.yaml config/train/<corpus>.yaml \
                -a config/algorithm/<algorithm>.yaml \
                -e <experiment_key> -c <checkpoint_file_name>

and the results would be under output/result/ / / / .

Evaluation

Note: The evaluation code is not well-refactored yet.

cd evaluation/ and check README.md

Pre-trained Models

Note: The checkpoints are with older version, might not capatiable with the current code. We would fix the problem in the future.

Since our codes are using Comet logger, you might need to create a dummy experiment by running:

from comet_ml import Experiment
experiment = Experiment()

then put the checkpoint files under output/ckpt/LibriTTS/ / /checkpoints/ .

You can download pretrained models here.

Results

Corpus LibriTTS VCTK
Speaker Similarity
Speaker Verification

Synthesized Speech Detection

Owner
Sung-Feng Huang
A Ph.D. student at National Taiwan University. Main research includes unsupervised learning, meta learning, speech separation, ASR, and some NLP.
Sung-Feng Huang
Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

Kai Ninomiya 5 Jul 18, 2022
Code for LIGA-Stereo Detector, ICCV'21

LIGA-Stereo Introduction This is the official implementation of the paper LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based

Xiaoyang Guo 75 Dec 09, 2022
Learning from Synthetic Humans, CVPR 2017

Learning from Synthetic Humans (SURREAL) Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev and Cordelia Schmid,

Gul Varol 538 Dec 18, 2022
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

To run a generation experiment (either conceptnet or atomic), follow these instructions: First Steps First clone, the repo: git clone https://github.c

Antoine Bosselut 575 Jan 01, 2023
A system used to detect whether a person is wearing a medical mask or not.

Mask_Detection_System A system used to detect whether a person is wearing a medical mask or not. To open the program, please follow these steps: Make

Mohamed Emad 0 Nov 17, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
A Pytorch implement of paper "Anomaly detection in dynamic graphs via transformer" (TADDY).

TADDY: Anomaly detection in dynamic graphs via transformer This repo covers an reference implementation for the paper "Anomaly detection in dynamic gr

Yue Tan 21 Nov 24, 2022
Reinforcement Learning Theory Book (rus)

Reinforcement Learning Theory Book (rus)

qbrick 206 Nov 27, 2022
Simple (but Strong) Baselines for POMDPs

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specific

Tianwei V. Ni 172 Dec 29, 2022
The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

Hao Luo 91 Dec 21, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 03, 2023
This project aims to be a handler for input creation and running of multiple RICEWQ simulations.

What is autoRICEWQ? This project aims to be a handler for input creation and running of multiple RICEWQ simulations. What is RICEWQ? From the descript

Yass Fuentes 1 Feb 01, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 07, 2023
A GUI to automatically create a TOPAS-readable MLC simulation file

Python script to create a TOPAS-readable simulation file descriring a Multi-Leaf-Collimator. Builds the MLC using the data from a 3D .stl file.

Sebastian Schäfer 0 Jun 19, 2022
A Simple Key-Value Data-store written in Python

mercury-db This is a File Based Key-Value Datastore that supports basic CRUD (Create, Read, Update, Delete) operations developed using Python. The dat

Vaidhyanathan S M 1 Jan 09, 2022
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment 55 Nov 23, 2022
CNN visualization tool in TensorFlow

tf_cnnvis A blog post describing the library: https://medium.com/@falaktheoptimist/want-to-look-inside-your-cnn-we-have-just-the-right-tool-for-you-ad

InFoCusp 778 Jan 02, 2023
[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation

ArSSR This repository is the pytorch implementation of our manuscript "An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonan

Qing Wu 19 Dec 12, 2022
This repo generates the training data and the model for Morpheus-Deblend

Morpheus-Deblend This repo generates the training data and the model for Morpheus-Deblend. This is the active development repo for the project and as

Ryan Hausen 2 Apr 18, 2022
Project dự đoán giá cổ phiếu bằng thuật toán LSTM gồm: code train và code demo

Web predicts stock prices using Long - Short Term Memory algorithm Give me some start please!!! User interface image: Choose: DayBegin, DayEnd, Stock

Vo Thuong Truong Nhon 8 Nov 11, 2022