Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

  1. Download the overlapped speech of LibriCSS dataset.

    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip
  2. Download the Conformer separation models.

    wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip
  3. Run the separation.

    3.1 Single-channel separation

    export MODEL_NAME=1ch_conformer_base
    python3 separate.py \
        --checkpoint checkpoints/$MODEL_NAME \
        --mix-scp utils/overlapped_speech_1ch.scp \
        --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
        --device-id 0 \
        --num_spks 2

    The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

    3.2 Seven-channel separation

    export MODEL_NAME=conformer_base
    python3 separate.py \
        --checkpoint checkpoints/$MODEL_NAME \
        --mix-scp utils/overlapped_speech_7ch.scp \
        --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
        --device-id 0 \
        --num_spks 2 \
        --mvdr True

    The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}
Owner
Sanyuan Chen (陈三元)
Sanyuan Chen (陈三元)
EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness Improving GAN Equilibrium by Raising Spatial Awareness Jianyuan Wang, Ceyuan Yang, Ying

GenForce: May Generative Force Be with You 149 Dec 19, 2022
Registration Loss Learning for Deep Probabilistic Point Set Registration

RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV

Felix Järemo Lawin 35 Nov 02, 2022
SOFT: Softmax-free Transformer with Linear Complexity, NeurIPS 2021 Spotlight

SOFT: Softmax-free Transformer with Linear Complexity SOFT: Softmax-free Transformer with Linear Complexity, Jiachen Lu, Jinghan Yao, Junge Zhang, Xia

Fudan Zhang Vision Group 272 Dec 25, 2022
U-Time: A Fully Convolutional Network for Time Series Segmentation

U-Time & U-Sleep Official implementation of The U-Time [1] model for general-purpose time-series segmentation. The U-Sleep [2] model for resilient hig

Mathias Perslev 176 Dec 19, 2022
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Certified Patch Robustness via Smoothed Vision Transformers

Certified Patch Robustness via Smoothed Vision Transformers This repository contains the code for replicating the results of our paper: Certified Patc

Madry Lab 35 Dec 14, 2022
Dungeons and Dragons randomized content generator

Component based Dungeons and Dragons generator Supports Entity/Monster Generation NPC Generation Weapon Generation Encounter Generation Environment Ge

Zac 3 Dec 04, 2021
Time series annotation library.

CrowdCurio Time Series Annotator Library The CrowdCurio Time Series Annotation Library implements classification tasks for time series. Features Suppo

CrowdCurio 51 Sep 15, 2022
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022
My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

machine-learning-with-graphs My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs Course materials can be

Marko Njegomir 7 Dec 14, 2022
Course content and resources for the AIAIART course.

AIAIART course This repo will house the notebooks used for the AIAIART course. Part 1 (first four lessons) ran via Discord in September/October 2021.

Jonathan Whitaker 492 Jan 06, 2023
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022
Yolov5-lite - Minimal PyTorch implementation of YOLOv5

Yolov5-Lite: Minimal YOLOv5 + Deep Sort Overview This repo is a shortened versio

Kadir Nar 57 Nov 28, 2022
Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision Project | PDF | Poster Fangyu Li, N. Dinesh Reddy, X

25 Dec 21, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion. NÜWA is a unified multimodal p

Microsoft 2.6k Jan 06, 2023
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Federated_learning codes used for the the paper "Evaluation of Federated Learning Aggregation Algorithms" and "A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison"

Federated Distance (FedDist) This is the code accompanying the Percom2021 paper "A Federated Learning Aggregation Algorithm for Pervasive Computing: E

GETALP 8 Jan 03, 2023