Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

Simple tutorials using Google's TensorFlow Framework

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Pytorch implementation of One-Shot Affordance Detection

Incomplete easy-to-use math solver and PDF generator.

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

CTRL-C: Camera calibration TRansformer with Line-Classification

Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Neural style in TensorFlow! 🎨

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

Styleformer - Official Pytorch Implementation

Bling's Object detection tool

retweet 4 satoshi ⚡️

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Hypersearch weight debugging and losses tutorial

Using pretrained GROVER to extract the atomic fingerprints from molecule

PantheonRL is a package for training and testing multi-agent reinforcement learning environments.

Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition.

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)