Project NII pytorch scripts

Overview

project-NII-pytorch-scripts

By Xin Wang, National Institute of Informatics, since 2021

I am a new pytorch user. If you have any suggestions or questions, pleas email wangxin at nii dot ac dot jp

Table of Contents


1. Note

For tutorials on neural vocoders

Tutorials are available in ./tutorials. Please follow the ./tutorials/README and work in this folder first

cd ./tutorials
head -n 2 README.md
# Hands-on materials for neural vocoders

For other projects

Just follow the rest of the README.

The repository is relatively large. You may use --depth 1 option to skip unnecessary files.

git clone --depth 1 https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts.git

Updates

2022-01-08: upload hn-sinc-nsf + hifi-gan

2022-01-08: upload RawNet2 for anti-spoofing

2. Overview

This repository hosts Pytorch codes for the following projects:

2.1 Neural source-filter waveform model

./project/01-nsf

  1. Cyclic-noise neural source-filter waveform model (NSF)

  2. Harmonic-plus-noise NSF with trainable sinc filter (Hn-sinc-NSF)

  3. Harmonic-plus-noise NSF with fixed FIR filter (Hn-NSF)

  4. Hn-sinc-NSF + HiFiGAN discriminator

All the projects include a pre-trained model on CMU-arctic database (4 speakers) and a demo script to run, train, do inference. Please check ./project/01-nsf/README.

Generated samples from pre-trained models are in ./project/01-nsf/*/__pre_trained/output. If not, please run the demo script to produce waveforms using pre-trained models.

Tutorial on NSF models is also available in ./tutorials

Note that this is the re-implementation of the projects based on CURRENNT. All the papers published so far used CURRENNT implementation.

Many samples can be found on NSF homepage.

2.2 Other neural waveform models

./project/05-nn-vocoders

  1. WaveNet vocoder

  2. WaveGlow

  3. Blow

  4. iLPCNet

All the projects include a pre-trained model and a one-click demo script. Please check ./project/05-nn-vocoders/README.

Generated samples from pre-trained models are in ./project/05-nn-vocoders/*/__pre_trained/output.

Tutorial is also available in ./tutorials

2.3 ASVspoof project with toy example

./project/04-asvspoof2021-toy

It takes time to download ASVspoof2019 database. Therefore, this project demonstrates how to train and evaluate the anti-spoofing model using a toy dataset.

Please try this project before checking other ASVspoof projects below.

A similar project is adopted for ASVspoof2021 LFCC-LCNN baseline, although the LFCC front-end is slightly different.

Please check ./project/04-asvspoof2021-toy/README.

2.4 Speech anti-spoofing for ASVspoof 2019 LA

./project/03-asvspoof-mega

This is for this anti-spoofing project (A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection, paper on arxiv).

There were 36 systems investigated, each of which was trained and evaluated for 6 rounds with different random seeds.

EER-mintDCF

This project is later extended to a book chapter called A Practical Guide to Logical Access Voice Presentation Attack Detection. Single system using RawNet2 is added, and score fusion is added.

EER-mintDCF

Pre-trained models, scores, training recipes are all available. Please check ./project/03-asvspoof-mega/README.

2.5 (Preliminary) speech anti-spoofing

./project/02-asvspoof

  1. Baseline LFCC + LCNN-binary-classifier (lfcc-lcnn-sigmoid)

  2. LFCC + LCNN + angular softmax (lfcc-lcnn-a-softmax)

  3. LFCC + LCNN + one-class softmax (lfcc-lcnn-ocsoftmax)

  4. LFCC + ResNet18 + one-class softmax (lfcc-restnet-ocsoftmax)

This is a pilot test on ASVspoof2019 LA task. I trained each system for 6 times on various GPU devices (single V100 or P100 card), each time with a different random initial seed. Figure below shows the DET curves for these systems: det_curve

The results vary a lot when simply changing the initial random seeds, even with the same random seed, Pytorch environment, and deterministic algorithm selected. This preliminary test motivated the study in ./project-03-asvspoof-mega.

For LCNN, please check this paper; for LFCC, please check this paper; for one-class softmax in ASVspoof, please check this paper.

3. Python environment

You may use ./env.yml to create the environment:

# create environment
conda env create -f env.yml

# load environment (whose name is pytorch-1.6)
conda activate pytorch-1.6

4. How to use

Take project/01-nsf/cyc-noise-nsf as an example:

# cd into one project
cd project/01-nsf/cyc-noise-nsf-4

# add PYTHONPATH and activate conda environment
source ../../../env.sh 

# run the script
bash 00_demo.sh

The printed info will show what is happening. The script may need 1 day or more to finish.

You may also put the job to the background rather than waiting for the job in front of the terminal:

# run the script in background
bash 00_demo.sh > log_batch 2>&1 &

The above steps will download the CMU-arctic data, run waveform generation using a pre-trained model, and train a new model.

5. Project design and convention

Data format

  • Waveform: 16/32-bit PCM or 32-bit float WAV that can be read by scipy.io.wavfile.read

  • Other data: binary, float-32bit, little endian (numpy dtype ). The data can be read in python by:

# for a data of shape [N, M]
f = open(filepath,'rb')
datatype = np.dtype(('
   ,(M,)))
data = np.fromfile(f,dtype=datatype)
f.close()

I assume data should be stored in c_continuous format (row-major). There are helper functions in ./core_scripts/data_io/io_tools.py to read and write binary data:

# create a float32 data array
import numpy as np
data = np.asarray(np.random.randn(5, 3), dtype=np.float32)

# write to './temp.bin' and read it as data2
import core_scripts.data_io.io_tools as readwrite
readwrite.f_write_raw_mat(data, './temp.bin')
data2 = readwrite.f_read_raw_mat('./temp.bin', 3)

# result should 0
data - data2

More instructions can be found in the Jupyter notebook ./tutorials/c01_data_format.ipynb.

Files in this repository

Name Function
./core_scripts scripts to manage the training process, data io, and so on
./core_modules finished pytorch modules
./sandbox new functions and modules to be test
./project project directories, and each folder correspond to one model for one dataset
./project/*/*/main.py script to load data and run training and inference
./project/*/*/model.py model definition based on Pytorch APIs
./project/*/*/config.py configurations for training/val/test set data

The motivation is to separate the training and inference process, the model definition, and the data configuration. For example:

  • To define a new model, change model.py

  • To run on a new database, change config.py

How the script works

The script starts with main.py and calls different function for model training and inference.

During training:

     <main.py>        Entry point and controller of training process
        |           
   Argument parse     core_scripts/config_parse/arg_parse.py
   Initialization     core_scripts/startup_config.py
   Choose device     
        | 
Initialize & load     core_scripts/data_io/customize_dataset.py
training data set
        |----------|
        .     Load data set   <config.py> 
        .     configuration 
        .          |
        .     Loop over       core_scripts/data_io/customize_dataset.py
        .     data subset
        .          |       
        .          |---------|
        .          .    Load one subset   core_scripts/data_io/default_data_io.py
        .          .         |
        .          |---------|
        .          |
        .     Combine subsets 
        .     into one set
        .          |
        |----------|
        |
Initialize & load 
development data set  
        |
Initialize Model     <model.py>
Model(), Loss()
        | 
Initialize Optimizer core_scripts/op_manager/op_manager.py
        |
Load checkpoint      --trained-model option to main.py
        |
Start training       core_scripts/nn_manager/nn_manager.py f_train_wrapper()
        |             
        |----------|
        .          |
        .     Loop over training data
        .     for one epoch
        .          |
        .          |-------|    core_scripts/nn_manager/nn_manager.py f_run_one_epoch()
        .          |       |    
        .          |  Loop over 
        .          |  training data
        .          |       |
        .          |       |-------|
        .          |       .    get data_in, data_tar, data_info
        .          |       .    Call data_gen <- Model.forward(...)   <mode.py>
        .          |       .    Call Loss.compute()                   <mode.py>
        .          |       .    loss.backward()
        .          |       .    optimizer.step()
        .          |       .       |
        .          |       |-------|
        .          |       |
        .          |  Save checkpoint 
        .          |       |
        .          |  Early stop?
        .          |       | No  \
        .          |       |      \ Yes
        .          |<------|       |
        .                          |
        |--------------------------|
       Done

A detailed flowchat is ./misc/APPENDIX_1.md. This may be useful if you want to hack on the code.

6 On NSF projects (./project/01-nsf)

Differences from CURRENNT implementation

There may be more, but here are the important ones:

  • "Batch-normalization": in CURRENNT, "batch-normalization" is conducted along the length sequence, i.e., assuming each frame as one sample;

  • No bias in CNN and FF: due to the 1st point, NSF in this repository uses bias=false for CNN and feedforward layers in neural filter blocks, which can be helpful to make the hidden signals around 0;

  • Smaller learning rate: due to the 1st point, learning rate in this repository is decreased from 0.0003 to a smaller value. Accordingly, more training epochs are required;

  • STFT framing/padding: in CURRENNT, the first frame starts from the 1st step of a signal; in this Pytorch repository (as Librosa), the first frame is centered around the 1st step of a signal, and the frame is padded with 0;

  • STFT backward: in CURRENNT, STFT backward follows the steps in this paper; in Pytorch repository, backward over STFT is done by the Pytorch library.

  • ...

The learning curves look similar to the CURRENNT version. learning_curve

24kHz

Most of my experiments are done on 16 kHz waveforms. For 24 kHz waveforms, FIR or sinc digital filters in the model may be changed for better performance:

  1. hn-nsf: lp_v, lp_u, hp_v, and hp_u are calculated for 16 kHz configurations. For different sampling rate, you may use this online tool http://t-filter.engineerjs.com to get the filter coefficients. In this case, the stop-band for lp_v and lp_u is extended to 12k, while the pass-band for hp_v and hp_u is extended to 12k. The reason is that, no matter what is the sampling rate, the actual formats (in Hz) and spectral of sounds don't change with the sampling rate;

  2. hn-sinc-nsf and cyc-noise-nsf: for the similar reason above, the cut-off-frequency value (0, 1) should be adjusted. I will try (hidden_feat * 0.2 + uv * 0.4 + 0.3) * 16 / 24 in model.CondModuleHnSincNSF.get_cut_f();

Links

The end

Owner
Yamagishi and Echizen Laboratories, National Institute of Informatics
Yamagishi and Echizen Laboratories, National Institute of Informatics, Japan
Yamagishi and Echizen Laboratories, National Institute of Informatics
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Self-labelling via simultaneous clustering and representation learning πŸ†— πŸ†— πŸŽ‰ NEW models (20th August 2020): Added standard SeLa pretrained torchvis

Yuki M. Asano 469 Jan 02, 2023
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu

9 Nov 27, 2022
Tightness-aware Evaluation Protocol for Scene Text Detection

TIoU-metric Release on 27/03/2019. This repository is built on the ICDAR 2015 evaluation code. If you propose a better metric and require further eval

Yuliang Liu 206 Nov 18, 2022
Utilities to bridge Canvas-generated course rosters with GitLab's API.

gitlab-canvas-utils A collection of scripts originally written for CSE 13S. Oversees everything from GitLab course group creation, student repository

Eugene Chou 5 Jun 08, 2022
Graph Convolutional Networks for Temporal Action Localization (ICCV2019)

Graph Convolutional Networks for Temporal Action Localization This repo holds the codes and models for the PGCN framework presented on ICCV 2019 Graph

Runhao Zeng 318 Dec 06, 2022
A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Pytorch-MBNet A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK Training To train a new model, please ru

46 Dec 28, 2022
Object detection evaluation metrics using Python.

Object detection evaluation metrics using Python.

Louis Facun 2 Sep 06, 2022
[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

Michihiro Yasunaga 155 Jan 08, 2023
Robotics environments

Robotics environments Details and documentation on these robotics environments are available in OpenAI's blog post and the accompanying technical repo

Farama Foundation 121 Dec 28, 2022
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

vid2vid Project | YouTube(short) | YouTube(full) | arXiv | Paper(full) Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic vid

NVIDIA Corporation 8.1k Jan 01, 2023
A clear, concise, simple yet powerful and efficient API for deep learning.

The Gluon API Specification The Gluon API specification is an effort to improve speed, flexibility, and accessibility of deep learning technology for

Gluon API 2.3k Dec 17, 2022
Point cloud processing tool library.

Point Cloud ToolBox This point cloud processing tool library can be used to process point clouds, 3d meshes, and voxels. Environment python 3.7.5 Dep

ZhangXinyun 40 Dec 09, 2022
πŸ₯‡ LG-AI-Challenge 2022 1μœ„ μ†”λ£¨μ…˜ μž…λ‹ˆλ‹€.

LG-AI-Challenge-for-Plant-Classification Daconμ—μ„œ μ§„ν–‰λœ 농업 ν™˜κ²½ 변화에 λ”°λ₯Έ μž‘λ¬Ό 병해 진단 AI κ²½μ§„λŒ€νšŒ 에 λŒ€ν•œ μ½”λ“œμž…λ‹ˆλ‹€. (colab directory에 μ½”λ“œκ°€ 잘 정리 λ˜μ–΄μžˆμŠ΅λ‹ˆλ‹€.) Requirements python

siwooyong 10 Jun 30, 2022
Code for LIGA-Stereo Detector, ICCV'21

LIGA-Stereo Introduction This is the official implementation of the paper LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based

Xiaoyang Guo 75 Dec 09, 2022
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics. By Andres Milioto @ University of Bonn. (for the new P

Photogrammetry & Robotics Bonn 314 Dec 30, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022
PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

Jingyun Liang 837 Jan 09, 2023
Tensorflow implementation of Swin Transformer model.

Swin Transformer (Tensorflow) Tensorflow reimplementation of Swin Transformer model. Based on Official Pytorch implementation. Requirements tensorflow

167 Jan 08, 2023
It's like Shape Editor in Maya but works with skeletons (transforms).

Skeleposer What is Skeleposer? Briefly, it's like Shape Editor in Maya, but works with transforms and joints. It can be used to make complex facial ri

Alexander Zagoruyko 1 Nov 11, 2022