NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Last update: Dec 23, 2022

Overview

NU-Wave — Official PyTorch Implementation

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Junhyeok Lee, Seungu Han @ MINDsLab Inc., SNU

Paper(arXiv): https://arxiv.org/abs/2104.02321 (Accepted to INTERSPEECH 2021)
Audio Samples: https://mindslab-ai.github.io/nuwave

Official Pytorch+Lightning Implementation for NU-Wave.

Update: CODE RELEASED! README is DONE.

Requirements

Pytorch >=1.7.0 for nn.SiLU(swish activation)
Pytorch-Lightning==1.1.6
The requirements are highlighted in requirements.txt.
We also provide docker setup Dockerfile.

Preprocessing

Before running our project, you need to download and preprocess dataset to .pt files

Download VCTK dataset
Remove speaker p280 and p315
Modify path of downloaded dataset data:dir in hparameters.yaml
run utils/wav2pt.py

$ python utils/wav2pt.py

Training

Adjust hparameters.yaml, especially train section.

train:
  batch_size: 18 # Dependent on GPU memory size
  lr: 0.00003
  weight_decay: 0.00
  num_workers: 64 # Dependent on CPU cores
  gpus: 2 # number of GPUs
  opt_eps: 1e-9
  beta1: 0.5
  beta2: 0.999

If you want to train with single speaker, use VCTKSingleSpkDataset instead of VCTKMultiSpkDataset for dataset in dataloader.py. And use batch_size=1 for validation dataloader.
Adjust data section in hparameters.yaml.

data:
  dir: '/DATA1/VCTK/VCTK-Corpus/wav48/p225' #dir/spk/format
  format: '*mic1.pt'
  cv_ratio: (223./231., 8./231., 0.00) #train/val/test

run trainer.py.

$ python trainer.py

If you want to resume training from checkpoint, check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,\
            required = False, help = "Resume Checkpoint epoch number")
    parser.add_argument('-s', '--restart', action = "store_true",\
            required = False, help = "Significant change occured, use this")
    parser.add_argument('-e', '--ema', action = "store_true",\
            required = False, help = "Start from ema checkpoint")
    args = parser.parse_args()

During training, tensorboard logger is logging loss, spectrogram and audio.

$ tensorboard --logdir=./tensorboard --bind_all

Evaluation

run for_test.py or test.py

$ python test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}
or
$ python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,
                required = True, help = "Resume Checkpoint epoch number")
    parser.add_argument('-e', '--ema', action = "store_true",
                required = False, help = "Start from ema checkpoint")
    parser.add_argument('--save', action = "store_true",
               required = False, help = "Save file")

While we provide lightning style test code test.py, it has device dependency. Thus, we recommend to use for_test.py.

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

The audio samples on our webpage are partially derived from:

VCTK dataset(0.92): 46 hours of English speech from 108 speakers.

Repository Structure

.
├── Dockerfile
├── dataloader.py           # Dataloader for train/val(=test)
├── filters.py              # Filter implementation
├── test.py                 # Test with lightning_loop.
├── for_test.py             # Test with for_loop. Recommended due to device dependency of lightning
├── hparameter.yaml         # Config
├── lightning_model.py      # NU-Wave implementation. DDPM is based on ivanvok's WaveGrad implementation
├── model.py                # NU-Wave model based on lmnt-com's DiffWave implementation
├── requirement.txt         # requirement libraries
├── sampling.py             # Sampling a file
├── trainer.py              # Lightning trainer
├── README.md           
├── LICSENSE
├── utils
│  ├── stft.py              # STFT layer
│  ├── tblogger.py          # Tensorboard Logger for lightning
│  └── wav2pt.py            # Preprocessing
└── docs                    # For github.io
   └─ ...

Citation & Contact

If this repository useful for your research, please consider citing! Bibtex will be updated after INTERSPEECH 2021 conference.

@article{lee2021nuwave,
  title={NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling},
  author={Lee, Junhyeok and Han, Seungu},
  journal={arXiv preprint arXiv:2104.02321},
  year={2021}
}

If you have a question or any kind of inquiries, please contact Junhyeok Lee at [email protected]

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Related tags

Overview

NU-Wave — Official PyTorch Implementation

Requirements

Preprocessing

Training

Evaluation

References

Repository Structure

Citation & Contact

Owner

MINDs Lab

Eff video representation - Efficient video representation through neural fields

The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

Code accompanying paper: Meta-Learning to Improve Pre-Training

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Dual Attention Network for Scene Segmentation (CVPR2019)

A big endian Gentoo port developed on a Pine64.org RockPro64

Improving XGBoost survival analysis with embeddings and debiased estimators

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

Language-Agnostic Website Embedding and Classification

Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

SmoothGrad implementation in PyTorch

A Topic Modeling toolbox

Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

BBScan py3 - BBScan py3 With Python

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark