Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Related tags

Deep LearningNANSY
Overview

NANSY:

Unofficial Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Notice

Papers' Demo

Check Authors' Demo page

Sample-Only Demo Page

Check Demo Page

Concerns

Among the various controllabilities, it is rather obvious that the voice conversion technique can be misused and potentially harm other people. 
More concretely, there are possible scenarios where it is being used by random unidentified users and contributing to spreading fake news. 
In addition, it can raise concerns about biometric security systems based on speech. 
To mitigate such issues, the proposed system should not be released without a consent so that it cannot be easily used by random users with malicious intentions. 
That being said, there is still a potential for this technology to be used by unidentified users. 
As a more solid solution, therefore, we believe a detection system that can discriminate between fake and real speech should be developed.

We provide both pretrained checkpoint of Discriminator network and inference code for this concern.

Environment

Requirements

pip install -r requirements.txt

Docker

Image

If using cu113 compatible environment, use Dockerfile
If using cu102 compatible environment, use Dockerfile-cu102

docker build -f Dockerfile -t nansy:v0.0 .

Container

After building appropriate image, use docker-compose or docker to run a container.
You may want to modify docker-compose.yml or docker_run_script.sh

docker-compose -f docker-compose.yml run --service-ports --name CONTAINER_NAME nansy_container bash
or
bash docker_run_script.sh

Pretrained hifi-gan

Download pretrained hifi-gan config and checkpoint
from hifi-gan to ./configs/hifi-gan/UNIVERSAL_V1

Pretrained Checkpoints

TODO

Datasets

Datasets used when training are:

Custom Datasets

Write your own code!
If inheriting datasets.custom.CustomDataset, self.data should be as:

self.data: list
self.data[i]: dict must have:
    'wav_path_22k': str = path_to_22k_wav_file
    'wav_path_16k': str = (optional) path_to_16k_wav_file
    'speaker_id': str = speaker_id

Train

If you prefer pytorch-lightning, python train.py -g 1

parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="configs/train_nansy.yaml")
parser.add_argument('-g', '--gpus', type=str,
                    help="number of gpus to use")
parser.add_argument('-p', '--resume_checkpoint_path', type=str, default=None,
                    help="path of checkpoint for resuming")
args = parser.parse_args()
return args

else python train_torch.py # TODO, not completely supported now

Configs Description

Edit configs/train_nansy.yaml.

Dataset settings

  • Adjust datasets.*.datasets list.
    • Paths to dataset config files should be in the list
datasets:
  train:
    class: datasets.base.MultiDataset
    datasets: [
      # 'configs/datasets/css10.yaml',
        'configs/datasets/vctk.yaml',
        'configs/datasets/libritts360.yaml',
    ]

    mode: train
    batch_size: 32 # Depends on GPU Memory, Original paper used 32
    shuffle: True
    num_workers: 16 # Depends on available CPU cores

  eval:
    class: datasets.base.MultiDataset
    datasets: [
      # 'configs/datasets/css10.yaml',
        'configs/datasets/vctk.yaml',
        'configs/datasets/libritts360.yaml',
    ]

    mode: eval
    batch_size: 32
    shuffle: False
    num_workers: 4
Dataset Config

Dataset configs are at ./configs/datasets/.
You might want to replace /raid/vision/dhchoi/data to YOUR_PATH_DO_DATA, especially at path section.

class: datasets.vctk.VCTKDataset # implemented Dataset class name
load:
  audio: 'configs/audio/22k.yaml'

path:
  root: /raid/vision/dhchoi/data/
  wav22: /raid/vision/dhchoi/data/VCTK-Corpus/wav22
  wav16: /raid/vision/dhchoi/data/VCTK-Corpus/wav16
  txt: /raid/vision/dhchoi/data/VCTK-Corpus/txt
  timestamp: ./vctk-silence-labels/vctk-silences.0.92.txt

  configs:
    train: /raid/vision/dhchoi/data/VCTK-Corpus/vctk_22k_train.txt
    eval: /raid/vision/dhchoi/data/VCTK-Corpus/vctk_22k_val.txt
    test: /raid/vision/dhchoi/data/VCTK-Corpus/vctk_22k_test.txt

Model Settings

  • Comment out or Delete Discriminator section if no Discriminator needed.
  • Adjust optimizer class, lr and betas if needed.
models:
  Analysis:
    class: models.analysis.Analysis

    optim:
      class: torch.optim.Adam
      kwargs:
        lr: 1e-4
        betas: [ 0.5, 0.9 ]

  Synthesis:
    class: models.synthesis.Synthesis

    optim:
      class: torch.optim.Adam
      kwargs:
        lr: 1e-4
        betas: [ 0.5, 0.9 ]

  Discriminator:
    class: models.synthesis.Discriminator

    optim:
      class: torch.optim.Adam
      kwargs:
        lr: 1e-4
        betas: [ 0.5, 0.9 ]

Logging & Pytorch-lightning settings

For pytorch-lightning configs in section pl, check official docs

pl:
  checkpoint:
    callback:
      save_top_k: -1
      monitor: "train/backward"
      verbose: True
      every_n_epochs: 1 # epochs

  trainer:
    gradient_clip_val: 0 # don't clip (default value)
    max_epochs: 10000
    num_sanity_val_steps: 1
    fast_dev_run: False
    check_val_every_n_epoch: 1
    progress_bar_refresh_rate: 1
    accelerator: "ddp"
    benchmark: True

logging:
  log_dir: /raid/vision/dhchoi/log/nansy/ # PATH TO SAVE TENSORBOARD LOG FILES
  seed: "31" # Experiment Seed
  freq: 100 # Logging frequency (step)
  device: cuda # Training Device (used only in train_torch.py) 
  nepochs: 1000 # Max epochs to run

  save_files: [ # Files To save for each experiment
      './*.py',
      './*.sh',
      'configs/*.*',
      'datasets/*.*',
      'models/*.*',
      'utils/*.*',
  ]

Tensorboard

During training, tensorboard logger logs loss, spectrogram and audio.

tensorboard --logdir YOUR_LOG_DIR_AT_CONFIG/YOUR_SEED --bind_all

Inference

Generator

python inference.py or bash inference.sh

You may want to edit inferece.py for custom manipulation.

parser = argparse.ArgumentParser()
parser.add_argument('--path_audio_conf', type=str, default='configs/audio/22k.yaml',
                    help='')
parser.add_argument('--path_ckpt', type=str, required=True,
                    help='path to pl checkpoint')
parser.add_argument('--path_audio_source', type=str, required=True,
                    help='path to source audio file, sr=22k')
parser.add_argument('--path_audio_target', type=str, required=True,
                    help='path to target audio file, sr=16k')
parser.add_argument('--tsa_loop', type=int, default=100,
                    help='iterations for tsa')
parser.add_argument('--device', type=str, default='cuda',
                    help='')
args = parser.parse_args()
return args

Discriminator

Note that 0=gt, 1=gen

python classify.py or bash classify.sh

parser = argparse.ArgumentParser()
parser.add_argument('--path_audio_conf', type=str, default='configs/audio/22k.yaml',
                    help='')
parser.add_argument('--path_ckpt', type=str, required=True,
                    help='path to pl checkpoint')
parser.add_argument('--path_audio_gt', type=str, required=True,
                    help='path to audio with same speaker')
parser.add_argument('--path_audio_gen', type=str, required=True,
                    help='path to generated audio ')
parser.add_argument('--device', type=str, default='cuda')
args = parser.parse_args()

License

NEEDS WORK

BSD 3-Clause License.

References

  • Choi, Hyeong-Seok, et al. "Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations."

  • Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations."

  • Desplanques, Brecht, Jenthe Thienpondt, and Kris Demuynck. "Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification."

  • Chen, Mingjian, et al. "Adaspeech: Adaptive text to speech for custom voice."

  • Cookbook formulae for audio equalizer biquad filter coefficients

This implementation uses codes/data from following repositories:

Provided Checkpoints are trained from:

Special Thanks

MINDsLab Inc. for GPU support

Special Thanks to:

for help with Audio-domain knowledge

Owner
Dongho Choi 최동호
Dongho Choi 최동호
A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

LARICS Lab 10 Aug 13, 2022
NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows

NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows This repo contains the code for the paper Tractable Densit

Layer6 Labs 4 Dec 12, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning This repository is for EMSRDPN introduced in the foll

7 Feb 10, 2022
This is an easy python software which allows to sort images with faces by gender and after by age.

Gender-age Classifier This is an easy python software which allows to sort images with faces by gender and after by age. Usage First install Deepface

Claudio Ciccarone 6 Sep 17, 2022
Semantic Bottleneck Scene Generation

SB-GAN Semantic Bottleneck Scene Generation Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the f

Samaneh Azadi 41 Nov 28, 2022
Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

Nick Sharp 247 Dec 28, 2022
8-week curriculum for AI Builders

curriculum 8-week curriculum for AI Builders สารบัญ บทที่ 1 - Machine Learning คืออะไร บทที่ 2 - ชุดข้อมูลมหัศจรรย์และถิ่นที่อยู่ บทที่ 3 - Stochastic

AI Builders 134 Jan 03, 2023
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
Differentiable Factor Graph Optimization for Learning Smoothers @ IROS 2021

Differentiable Factor Graph Optimization for Learning Smoothers Overview Status Setup Datasets Training Evaluation Acknowledgements Overview Code rele

Brent Yi 60 Nov 14, 2022
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

CoVA: Context-aware Visual Attention for Webpage Information Extraction Abstract Webpage information extraction (WIE) is an important step to create k

Keval Morabia 41 Jan 01, 2023
QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing Environment Tested on Ubuntu 14.04 64bit and 16.04 64bit Installation # disabl

gts3.org (<a href=[email protected])"> 581 Dec 30, 2022
Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition This is a Torch implementation of "Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhan

Kimmy 561 Dec 01, 2022
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022
Implementation of Convolutional LSTM in PyTorch.

ConvLSTM_pytorch This file contains the implementation of Convolutional LSTM in PyTorch made by me and DavideA. We started from this implementation an

Andrea Palazzi 1.3k Dec 29, 2022
The MLOps platform for innovators 🚀

​ DS2.ai is an integrated AI operation solution that supports all stages from custom AI development to deployment. It is an AI-specialized platform service that collects data, builds a training datas

9 Jan 03, 2023
Datasets and pretrained Models for StyleGAN3 ...

Datasets and pretrained Models for StyleGAN3 ... Dear arfiticial friend, this is a collection of artistic datasets and models that we have put togethe

lucid layers 34 Oct 06, 2022
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 04, 2023
Fast Neural Representations for Direct Volume Rendering

Fast Neural Representations for Direct Volume Rendering Sebastian Weiss, Philipp Hermüller, Rüdiger Westermann This repository contains the code and s

Sebastian Weiss 20 Dec 03, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023