Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Overview

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Video demonstration (click the picture):

Toolbox demo

Papers implemented

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1703.10135 Tacotron (synthesizer) Tacotron: Towards End-to-End Speech Synthesis fatchord/WaveRNN
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo

News

14/02/21: This repo now runs on PyTorch instead of Tensorflow, thanks to the help of @bluefish. If you wish to run the tensorflow version instead, checkout commit 5425557.

13/11/19: I'm now working full time and I will not maintain this repo anymore. To anyone who reads this:

  • If you just want to clone your voice (and not someone else's): I recommend our free plan on Resemble.AI. You will get a better voice quality and less prosody errors.
  • If this is not your case: proceed with this repository, but you might end up being disappointed by the results. If you're planning to work on a serious project, my strong advice: find another TTS repo. Go here for more info.

20/08/19: I'm working on resemblyzer, an independent package for the voice encoder. You can use your trained encoder models from this repo with it.

06/07/19: Need to run within a docker container on a remote server? See here.

25/06/19: Experimental support for low-memory GPUs (~2gb) added for the synthesizer. Pass --low_mem to demo_cli.py or demo_toolbox.py to enable it. It adds a big overhead, so it's not recommended if you have enough VRAM.

Setup

1. Install Requirements

Python 3.6 or 3.7 is needed to run the toolbox.

  • Install PyTorch (>=1.0.1).
  • Install ffmpeg.
  • Run pip install -r requirements.txt to install the remaining necessary packages.

2. Download Pretrained Models

Download the latest here.

3. (Optional) Test Configuration

Before you download any dataset, you can begin by testing your configuration with:

python demo_cli.py

If all tests pass, you're good to go.

4. (Optional) Download Datasets

For playing with the toolbox alone, I only recommend downloading LibriSpeech/train-clean-100. Extract the contents as /LibriSpeech/train-clean-100 where is a directory of your choosing. Other datasets are supported in the toolbox, see here. You're free not to download any dataset, but then you will need your own data as audio files or you will have to record it with the toolbox.

5. Launch the Toolbox

You can then try the toolbox:

python demo_toolbox.py -d
or
python demo_toolbox.py

depending on whether you downloaded any datasets. If you are running an X-server or if you have the error Aborted (core dumped), see this issue.

Owner
Corentin Jemine
Machine learning engineer at Resemble.AI
Corentin Jemine
Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

Niek Zhen 3 Jan 05, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
A cross platform OCR Library based on PaddleOCR & OnnxRuntime

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

RapidOCR Team 767 Jan 09, 2023
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

9 Jan 08, 2023
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
A PyTorch implementation of VIOLET

VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling A PyTorch implementation of VIOLET Overview VIOLET is an implementati

Tsu-Jui Fu 119 Dec 30, 2022
VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

Shawn 1 Jan 21, 2022
ACL'22: Structured Pruning Learns Compact and Accurate Models

☕ CoFiPruning: Structured Pruning Learns Compact and Accurate Models This repository contains the code and pruned models for our ACL'22 paper Structur

Princeton Natural Language Processing 130 Jan 04, 2023
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

efficient-task-transfer This repository contains code for the experiments in our paper "What to Pre-Train on? Efficient Intermediate Task Selection".

AdapterHub 26 Dec 24, 2022
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation

Diego 1 Mar 20, 2022
GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.

GNES.ai 1.2k Jan 06, 2023
This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its

SaiVenkatDhulipudi 2 Nov 17, 2021
Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

Lucas Nestler 112 Dec 05, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 01, 2022
Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

AI2 1.6k Dec 29, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
leaking paid token generator that was a shit lmao for 100$ haha

Discord-Token-Generator-Leaked leaking paid token generator that was a shit lmao for 100$ he selling it for 100$ wth here the code enjoy don't forget

Keevo 5 Apr 15, 2022