Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

Implementation of Vaswani, Ashish, et al. "Attention is all you need."

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

Multi-layer convolutional LSTM with Pytorch

Using pretrained GROVER to extract the atomic fingerprints from molecule

本步态识别系统主要基于GaitSet模型进行实现

Generative Models for Graph-Based Protein Design

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

graph-theoretic framework for robust pairwise data association

CC-GENERATOR - A python script for generating CC

Merlion: A Machine Learning Framework for Time Series Intelligence

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

A cross-lingual COVID-19 fake news dataset

From this paper "SESNet: A Semantically Enhanced Siamese Network for Remote Sensing Change Detection"

Download from Onlyfans.com.

SFD implement with pytorch

Container : Context Aggregation Network

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.