An 16kHz implementation of HiFi-GAN for soft-vc.

Overview

HiFi-GAN

An 16kHz implementation of HiFi-GAN for soft-vc.

Relevant links:

Example Usage

import torch
import numpy as np

# Load checkpoint
hifigan = torch.hub.load("bshall/hifigan:main", "hifigan_hubert_soft").cuda()
# Load mel-spectrogram
mel = torch.from_numpy(np.load("path/to/mel")).unsqueeze(0).cuda()
# Generate
wav, sr = hifigan.generate(mel)

Train

Step 1: Download and extract the LJ-Speech dataset

Step 2: Resample the audio to 16kHz:

usage: resample.py [-h] [--sample-rate SAMPLE_RATE] in-dir out-dir

Resample an audio dataset.

positional arguments:
  in-dir                path to the dataset directory
  out-dir               path to the output directory

optional arguments:
  -h, --help            show this help message and exit
  --sample-rate SAMPLE_RATE
                        target sample rate (default 16kHz)

Step 3: Download the dataset splits and move them into the root of the dataset directory. After steps 2 and 3 your dataset directory should look like this:

LJSpeech-1.1
│   test.txt
│   train.txt
│   validation.txt
├───mels
└───wavs

Note: the mels directory is optional. If you want to fine-tune HiFi-GAN the mels directory should contain ground-truth aligned spectrograms from an acoustic model.

Step 4: Train HiFi-GAN:

usage: train.py [-h] [--resume RESUME] [--finetune] dataset-dir checkpoint-dir

Train or finetune HiFi-GAN.

positional arguments:
  dataset-dir      path to the preprocessed data directory
  checkpoint-dir   path to the checkpoint directory

optional arguments:
  -h, --help       show this help message and exit
  --resume RESUME  path to the checkpoint to resume from
  --finetune       whether to finetune (note that a resume path must be given)

Generate

To generate using the trained HiFi-GAN models, see Example Usage or use the generate.py script:

usage: generate.py [-h] [--model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}] in-dir out-dir

Generate audio for a directory of mel-spectrogams using HiFi-GAN.

positional arguments:
  in-dir                path to directory containing the mel-spectrograms
  out-dir               path to output directory

optional arguments:
  -h, --help            show this help message and exit
  --model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}
                        available models

Acknowledgements

This repo is based heavily on https://github.com/jik876/hifi-gan.

You might also like...
 Fast Soft Color Segmentation
Fast Soft Color Segmentation

Fast Soft Color Segmentation

Permute Me Softly: Learning Soft Permutations for Graph Representations

Permute Me Softly: Learning Soft Permutations for Graph Representations

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics
[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

CPDeform Code and data for paper Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics at ICLR 2022 (Spotlight). @InProceed

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

Comments
  • is pretrained weight of discriminator of base model available?

    is pretrained weight of discriminator of base model available?

    Thanks for nice work. @bshall

    I'm trying to train hifigan now, but it takes so long training it from scratch using other dataset.

    If discriminator of base model is also available, I could start finetuning based on that vocoder. it seems that you released only generator. Could you also release discriminator weights?

    opened by seastar105 3
  • NaN during training when using own dataset

    NaN during training when using own dataset

    While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point. The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:

    https://github.com/bshall/hifigan/blob/374a4569eae5437e2c80d27790ff6fede9fc1c46/hifigan/dataset.py#L106

    I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:

    wav = flip * gain * wav / max([wav.abs().max(), 0.001])
    
    opened by cjay42 0
  • How to use this Vocoder with your Tacotron?

    How to use this Vocoder with your Tacotron?

    Thank you for your work. I used your Tacotron in your Universal Vocoding.The quality of the speech is excellent. However, the inference speed is slow. for that reason, I would like to use this hifigan as a vocoder. But Tacotron's n_mel is 80, while hifigan's n_mel is 128. How to use hifigan with Tacotron?

    opened by gheyret 0
Owner
Benjamin van Niekerk
PhD student at Stellenbosch University. Interested in speech and audio technology.
Benjamin van Niekerk
RSNA Intracranial Hemorrhage Detection with python

RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng

24 Nov 30, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
On-device speech-to-intent engine powered by deep learning

Rhino Made in Vancouver, Canada by Picovoice Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a giv

Picovoice 510 Dec 30, 2022
Classifies galaxy morphology with Bayesian CNN

Zoobot Zoobot classifies galaxy morphology with deep learning. This code will let you: Reproduce and improve the Galaxy Zoo DECaLS automated classific

Mike Walmsley 39 Dec 20, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours

tsp-streamlit Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours.

4 Nov 05, 2022
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
Python periodic table module

elemenpy Hello! elements.py is a small Python periodic table module that is used for calling certain information about an element. Installation Instal

Eric Cheng 2 Dec 27, 2021
Experiments for Fake News explainability project

fake-news-explainability Experiments for fake news explainability project This repository only contains the notebooks used to train the models and eva

Lorenzo Flores (Lj) 1 Dec 03, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

Lixin YANG 99 Dec 26, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 09, 2023
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these enviro

Google 1.5k Jan 02, 2023
Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Movement Primitives Movement primitives are a common group of policy representations in robotics. There are many different types and variations. This

DFKI Robotics Innovation Center 63 Jan 06, 2023
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022
ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

mingkai 45 Oct 25, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

34 May 24, 2022
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
MPViT:Multi-Path Vision Transformer for Dense Prediction

MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu

Youngwan Lee 272 Dec 20, 2022