Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Last update: Nov 28, 2022

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

Download the overlapped speech of LibriCSS dataset.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip

Download the Conformer separation models.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip

Run the separation.

3.1 Single-channel separation

export MODEL_NAME=1ch_conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_1ch.scp \
    --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2

The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

3.2 Seven-channel separation

export MODEL_NAME=conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_7ch.scp \
    --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2 \
    --mvdr True

The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

Environment

Get Started

Citation

Owner

Sanyuan Chen (陈三元)

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

Breaching - Breaching privacy in federated learning scenarios for vision and text

METER: Multimodal End-to-end TransformER

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

EEGEyeNet is benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty

Copy Paste positive polyp using poisson image blending for medical image segmentation

Catalyst.Detection

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Distributed Evolutionary Algorithms in Python

Classifies galaxy morphology with Bayesian CNN

DaReCzech is a dataset for text relevance ranking in Czech

CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

[CVPR 2022 Oral] Balanced MSE for Imbalanced Visual Regression https://arxiv.org/abs/2203.16427

Code release for "Making a Bird AI Expert Work for You and Me".

Img-process-manual - Utilize Python Numpy and Matplotlib to realize OpenCV baisc image processing function