The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Last update: Dec 02, 2022

Related tags

Deep Learning SF-Net

Overview

SF-Net for fullband SE

This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement", which is submitted to Interspecch 2022. Some audio samples are provided here and the code for GCRN-full, DS-Net-full, CTS-Net-full and the network configuration of SF-Net are released.

Abstract：Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Related tags

Overview

SF-Net for fullband SE

Demo page of audio samples

System flowchart of SF-Net

Results:

Abaltion study

Comparison with SOTA

Visualization of spectrograms

VB dataset

DNS blind set

Owner

Guochen Yu

Codebase for Inducing Causal Structure for Interpretable Neural Networks

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Generative Flow Networks

Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

A New Approach to Overgenerating and Scoring Abstractive Summaries

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Differential rendering based motion capture blender project.

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)

Train Dense Passage Retriever (DPR) with a single GPU

Image Processing, Image Smoothing, Edge Detection and Transforms

Covid-19 Test AI (Deep Learning - NNs) Software. Accuracy is the %96.5, loss is the 0.09 :)

A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Official PyTorch implementation of Spatial Dependency Networks.

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

PURE: End-to-End Relation Extraction

A simple baseline for 3d human pose estimation in PyTorch.