Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Overview

UnivNet

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

This is an unofficial PyTorch implementation of Jang et al. (Kakao), UnivNet.

arXiv githubio License

To-Do List

  • Release checkpoint of pre-trained model
  • Extract wav samples for audio sample page
  • Add results including validation loss graph

Key Features

  • According to the authors of the paper, UnivNet obtained the best objective results among the recent GAN-based neural vocoders (including HiFi-GAN) as well as outperforming HiFi-GAN in a subjective evaluation. Also its inference speed is 1.5 times faster than HiFi-GAN.

  • This repository uses the same mel-spectrogram function as the Official HiFi-GAN, which is compatible with NVIDIA/tacotron2.

  • Our default mel calculation hyperparameters are as below, following the original paper.

    audio:
      n_mel_channels: 100
      filter_length: 1024
      hop_length: 256 # WARNING: this can't be changed.
      win_length: 1024
      sampling_rate: 24000
      mel_fmin: 0.0
      mel_fmax: 12000.0

    You can modify the hyperparameters to be compatible with your acoustic model.

Prerequisites

The implementation needs following dependencies.

  1. Python 3.6
  2. PyTorch 1.6.0
  3. NumPy 1.17.4 and SciPy 1.5.4
  4. Install other dependencies in requirements.txt.
    pip install -r requirements.txt

Datasets

Preparing Data

  • Download the training dataset. This can be any wav file with sampling rate 24,000Hz. The original paper used LibriTTS.
    • LibriTTS train-clean-360 split tar.gz link
    • Unzip and place its contents under datasets/LibriTTS/train-clean-360.
  • If you want to use wav files with a different sampling rate, please edit the configuration file (see below).

Note: The mel-spectrograms calculated from audio file will be saved as **.mel at first, and then loaded from disk afterwards.

Preparing Metadata

Following the format from NVIDIA/tacotron2, the metadata should be formatted as:

path_to_wav|transcript|speaker_id
path_to_wav|transcript|speaker_id
...

Train/validation metadata for LibriTTS train-clean-360 split and are already prepared in datasets/metadata. 5% of the train-clean-360 utterances were randomly sampled for validation.

Since this model is a vocoder, the transcripts are NOT used during training.

Train

Preparing Configuration Files

  • Run cp config/default.yaml config/config.yaml and then edit config.yaml

  • Write down the root path of train/validation in the data section. The data loader parses list of files within the path recursively.

    data:
      train_dir: 'datasets/'	# root path of train data (either relative/absoulte path is ok)
      train_meta: 'metadata/libritts_train_clean_360_train.txt'	# relative path of metadata file from train_dir
      val_dir: 'datasets/'		# root path of validation data
      val_meta: 'metadata/libritts_train_clean_360_val.txt'		# relative path of metadata file from val_dir

    We provide the default metadata for LibriTTS train-clean-360 split.

  • Modify channel_size in gen to switch between UnivNet-c16 and c32.

    gen:
      noise_dim: 64
      channel_size: 32 # 32 or 16
      dilations: [1, 3, 9, 27]
      strides: [8, 8, 4]
      lReLU_slope: 0.2

Training

python trainer.py -c CONFIG_YAML_FILE -n NAME_OF_THE_RUN

Tensorboard

tensorboard --logdir logs/

If you are running tensorboard on a remote machine, you can open the tensorboard page by adding --bind_all option.

Inference

python inference.py -p CHECKPOINT_PATH -i INPUT_MEL_PATH

Pre-trained Model

A pre-trained model will be released soon. The model was trained on LibriTTS train-clean-360 split.

Results

See audio samples at https://mindslab-ai.github.io/univnet/

Comparison with the results on paper

Model MOS PESQ(↑) RMSE(↓)
Recordings 4.16±0.09 4.50 0.000
Results in Paper (UnivNet-c32) 3.93±0.09 3.70 0.316
Ours (UnivNet-c32) - TBD TBD

Note

This code is an unofficial implementation, there may be some differences from the original paper.

  • Our UnivNet generator has smaller number of parameters (c32: 5.11M, c16: 1.42M) than the paper (c32: 14.89M, c16: 4.00M). So far, we have not encountered any issues from using a smaller model size. If run into any problem, please report it as an issue.

Implementation Authors

Implementation authors are:

Special thanks to

License

This code is licensed under BSD 3-Clause License.

We referred following codes and repositories.

References

Papers

Datasets

Owner
MINDs Lab
MINDsLab provides AI platform and various AI engines based on deep machine learning.
MINDs Lab
Aydin is a user-friendly, feature-rich, and fast image denoising tool

Aydin is a user-friendly, feature-rich, and fast image denoising tool that provides a number of self-supervised, auto-tuned, and unsupervised image denoising algorithms.

Royer Lab 99 Dec 14, 2022
Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Diverse Semantic Image Synthesis via Probability Distribution Modeling (CVPR 2021) Paper Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu,

tzt 45 Nov 17, 2022
The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Inferring Spatial Uncertainty in Object Detection A teaser version of the code for the paper Labels Are Not Perfect: Inferring Spatial Uncertainty in

ZINING WANG 21 Mar 03, 2022
Python SDK for building, training, and deploying ML models

Overview of Kubeflow Fairing Kubeflow Fairing is a Python package that streamlines the process of building, training, and deploying machine learning (

Kubeflow 325 Dec 13, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
Using some basic methods to show linkages and transformations of robotic arms

roboticArmVisualizer Python GUI application to create custom linkages and adjust joint angles. In the future, I plan to add 2d inverse kinematics solv

Sandesh Banskota 1 Nov 19, 2021
Large scale PTM - PPI relation extraction

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT The silver standard

1 Feb 25, 2022
Capstone-Project-2 - A game program written in the Python language

Capstone-Project-2 My Pygame Game Information: Description This Pygame project i

Nhlakanipho Khulekani Hlophe 1 Jan 04, 2022
Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor 👀 your Machine Learning training or testing process o

Rishit Dagli 54 Nov 01, 2022
Implementation of FSGNN

FSGNN Implementation of FSGNN. For more details, please refer to our paper Experiments were conducted with following setup: Pytorch: 1.6.0 Python: 3.8

19 Dec 05, 2022
'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Across-animal odor decoding by probabilistic manifold alignment (NeurIPS 2021) This repository is the official implementation of aligned mixture of la

Pedro Herrero-Vidal 3 Jul 12, 2022
Yet another video caption

Yet another video caption

Fan Zhimin 5 May 26, 2022
Cosine Annealing With Warmup

CosineAnnealingWithWarmup Formulation The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an

zhuyun 4 Apr 18, 2022
🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

coqui 92 Dec 19, 2022
Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

Yechan Kim 8 Oct 29, 2022
Heterogeneous Temporal Graph Neural Network

Heterogeneous Temporal Graph Neural Network This repository contains the datasets and source code of HTGNN. run_mag.ipynb is the training and testing

15 Dec 22, 2022
House_prices_kaggle - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

Gurpreet Singh 1 Jan 01, 2022
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
Keras-1D-ACGAN-Data-Augmentation

Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti

Jae-Hoon Shim 7 Dec 23, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022