Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

image

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

  • python 3.6.0
  • pytorch 1.4.0
  • pyyaml 5.4.1
  • numpy 1.19.5
  • librosa 0.8.0
  • soundfile 0.10.2
  • tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

  • We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.

  • If you wanna run our project, please install as the description of ParallelWaveGAN project first.

  • And then prepare all the mel-spectrogram data as ParallelWaveGAN do.

  • Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.

  • Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

traiNNer traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation to

202 Jan 04, 2023
Age Progression/Regression by Conditional Adversarial Autoencoder

Age Progression/Regression by Conditional Adversarial Autoencoder (CAAE) TensorFlow implementation of the algorithm in the paper Age Progression/Regre

Zhifei Zhang 603 Dec 22, 2022
In this project, we'll be making our own screen recorder in Python using some libraries.

Screen Recorder in Python Project Description: In this project, we'll be making our own screen recorder in Python using some libraries. Requirements:

Hassan Shahzad 4 Jan 24, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
Experiments with the Robust Binary Interval Search (RBIS) algorithm, a Query-Based prediction algorithm for the Online Search problem.

OnlineSearchRBIS Online Search with Best-Price and Query-Based Predictions This is the implementation of the Robust Binary Interval Search (RBIS) algo

S. K. 1 Apr 16, 2022
A visualisation tool for Deep Reinforcement Learning

DRLVIS - Visualising Deep Reinforcement Learning Created by Marios Sirtmatsis with the support of Alex Bäuerle. DRLVis is an application used for visu

Marios Sirtmatsis 1 Nov 04, 2021
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
An official implementation of the Anchor DETR.

Anchor DETR: Query Design for Transformer-Based Detector Introduction This repository is an official implementation of the Anchor DETR. We encode the

MEGVII Research 276 Dec 28, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 188 Dec 29, 2022
A library for building and serving multi-node distributed faiss indices.

About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol

Meta Research 170 Dec 30, 2022
Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

1 Oct 28, 2022
Direct design of biquad filter cascades with deep learning by sampling random polynomials.

IIRNet Direct design of biquad filter cascades with deep learning by sampling random polynomials. Usage git clone https://github.com/csteinmetz1/IIRNe

Christian J. Steinmetz 55 Nov 02, 2022
face property detection pytorch

This is the face property train code of project face-detection-project

i am x 2 Oct 18, 2021
A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

Fully Distributed CIDACS-RL The CIDACS-RL is a brazillian record linkage tool suitable to integrate large amount of data with high accuracy. However,

Robespierre Pita 5 Nov 04, 2022
Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leor

Sontag Lab 3 Feb 03, 2022
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Pre

Facebook Research 437 Dec 23, 2022
This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Two-Timescale-DNN Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding This repository contains the entire code for our work

QiyuHu 3 Mar 07, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 253 Dec 21, 2022