MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Overview

MicRank: Learning to Rank Microphones for Distant Speech Recognition

Application Scenario

Many applications nowadays envision the presence of multiple heterogeneous recording devices (e.g. Microsoft Project Denmark, CHiME-5, CHiME-6 and Voices from a Distance Challenges, DIRHA project et cetera).

Audio signals captured by different microphones can be suitably combined at front-end level by using beamforming techniques. However this combination could be very challenging as in an ad-hoc microphone network microphones can be very far from each other. Moreover some could be close to noise sources or, for a particular utterance, too far from the speaker to be of any usefulness and, to further complicate things, synchronization issues may appear.

An intriguing approach could be to select only the best microphone for each utterance or instead to select only a promising subset of microphones for beamforming or ROVER combination, thus potentially saving resources and/or improving results by excluding "bad" channels. This can be performed by suitable automatic Channel Selection or Channel Ranking algorithms.

What is MicRank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels based on ASR-backend performance or any other metric/back-end task (e.g. STOI if one wishes to rank microphones based on speech intelligibility et cetera).

It is agnostic with respect to the array geometry and type of recognition back-end and it does not require sample-level synchronization between devices.

Remarkably, it is able to considerably improve over previous selection techniques, reaching comparable and in some instances better performance than oracle signal-based measures like PESQ, STOI or SDR. This is achieved with a very small model with only 266k learnable parameters, making this method much more computationally efficient than decoder or posterior based channel selection methods.

LibriAdHoc Synthetic Dataset Recipe

Coming Soon


citing MicRank

If this code has been useful, use this:

@misc{cornell2021learning,
      title={Learning to Rank Microphones for Distant Speech Recognition}, 
      author={Samuele Cornell and Alessio Brutti and Marco Matassoni and Stefano Squartini},
      year={2021},
      eprint={2104.02819},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
Owner
Samuele Cornell
Information Engineering PhD candidate @ Marche Polytechnic University
Samuele Cornell
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Interscript The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts. Dataset data.json contains the data in an

AI2 8 Dec 01, 2022
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind

Omar Laham 1 Jan 14, 2022
Python wrapper to access the amazon selling partner API

PYTHON-AMAZON-SP-API Amazon Selling-Partner API If you have questions, please join on slack Contributions very welcome! Installation pip install pytho

Michael Primke 330 Jan 06, 2023
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 06, 2023
PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Hand Biomechanical Constraints Pytorch Unofficial PyTorch reimplementation of Hand-Biomechanical-Constraints (ECCV2020). This project reimplement foll

Hao Meng 59 Dec 20, 2022
Pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model'

RTK-PAD This is an official pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model', which is accepted by IEEE T

6 Aug 01, 2022
TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations Requirements python 3.6 torch 1.9 numpy 1.19 Quick Start The experimen

DMIRLAB 4 Oct 16, 2022
Underwater image enhancement

LANet Our work proposes an adaptive learning attention network (LANet) to solve the problem of color casts and low illumination in underwater images.

LiuShiBen 7 Sep 14, 2022
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

ARES This repository contains the code for ARES (Adversarial Robustness Evaluation for Safety), a Python library for adversarial machine learning rese

Tsinghua Machine Learning Group 377 Dec 20, 2022
Pytorch implementation of ProjectedGAN

ProjectedGAN-pytorch Pytorch implementation of ProjectedGAN (https://arxiv.org/abs/2111.01007) Note: this repository is still under developement. @InP

Dominic Rampas 17 Dec 14, 2022
Sample and Computation Redistribution for Efficient Face Detection

Introduction SCRFD is an efficient high accuracy face detection approach which initially described in Arxiv. Performance Precision, flops and infer ti

Sajjad Aemmi 13 Mar 05, 2022
Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Spinning Language Models for Propaganda-As-A-Service This is the source code for the Arxiv version of the paper. You can use this Google Colab to expl

Eugene Bagdasaryan 16 Jan 03, 2023
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 01, 2023
Beancount-mercury - Beancount importer for Mercury Startup Checking

beancount-mercury beancount-mercury provides an Importer for converting CSV expo

Michael Lynch 4 Oct 31, 2022
Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets ========================================================================

23 Apr 06, 2022
Action Segmentation Evaluation

Reference Action Segmentation Evaluation Code This repository contains the reference code for action segmentation evaluation. If you have a bug-fix/im

5 May 22, 2022
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

22 Sep 22, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 04, 2022