Multistream CNN for Robust Acoustic Modeling

Overview

Multistream Convolutional Neural Network (CNN)

A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. It processes input speech with diverse resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rate of 3 are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer, as illustrated below:

Alt text

References

Multistream CNN for Robust Acoustic Modeling [paper]

{
  @inproceedings{han2021multistream-cnn,
    title={Multistream CNN for Robust Acoustic Modeling},
    author={Kyu J. Han and Jing Pan and Venkata Krishna Naveen Tadala and Tao Ma and Dan Povey},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2021}
}

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [paper]

{
  @inproceedings{pan2020asapp-asr,
    title={ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition},
    author={Jing Pan and Joshua Shapiro and Jeremy Wohlwend and Kyu J. Han and Tao Lei and Tao Ma},
    booktitle={Interspeech},
    year={2020}
}

Installation

Please follow the original Kaldi build sequence, as below.

>> cd tools; make; cd ../src; ./configure; make clean; make -j clean depend; make -j all

Recipes and Results

LibriSpeech

>> egs/librispeech/s5/local/chain/run_multistream_cnn_1a.sh
dev-clean dev-other test-clean test-other
tdnn_1d 3.29 8.71 3.80 8.76
multistream_cnn_1a 3.20 7.68 3.54 7.87

Fisher-SWBD

>> egs/fisher_swbd/s5/local/chain/run_multistream_cnn_1a.sh
eval2000 swbd callhm
tdnn_7d 12.6 8.8 16.3
multistream_cnn_1a 12.6 9.2 15.7
Owner
ASAPP Research
AI for Enterprise
ASAPP Research
πŸ“ Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

Kilian Batzner 63 May 22, 2021
Lightweight tool to perform MITM attack on local network

ARPSpy - A lightweight tool to perform MITM attack Using many library to perform ARP Spoof and auto-sniffing HTTP packet containing credential. (Never

MinhItachi 8 Aug 28, 2022
CT-Net: Channel Tensorization Network for Video Classification

[ICLR2021] CT-Net: Channel Tensorization Network for Video Classification @inproceedings{ li2021ctnet, title={{\{}CT{\}}-Net: Channel Tensorization Ne

33 Nov 15, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon

42 Dec 02, 2022
[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

Joseph K J 66 Jan 04, 2023
A simple, fully convolutional model for real-time instance segmentation.

You Only Look At CoefficienTs β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β• β•šβ–ˆβ–ˆ

Daniel Bolya 4.6k Dec 30, 2022
Subdivision-based Mesh Convolutional Networks

Subdivision-based Mesh Convolutional Networks The official implementation of SubdivNet in our paper, Subdivion-based Mesh Convolutional Networks Requi

Zheng-Ning Liu 181 Dec 28, 2022
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
Self Governing Neural Networks (SGNN): the Projection Layer

Self Governing Neural Networks (SGNN): the Projection Layer A SGNN's word projections preprocessing pipeline in scikit-learn In this notebook, we'll u

Guillaume Chevalier 22 Nov 06, 2022
TraSw for FairMOT - A Single-Target Attack example (Attack ID: 19; Screener ID: 24):

TraSw for FairMOT A Single-Target Attack example (Attack ID: 19; Screener ID: 24): Fig.1 Original Fig.2 Attacked By perturbing only two frames in this

Derry Lin 21 Dec 21, 2022
Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

Hugging Face 54 Dec 20, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion: A Machine Learning Library for Time Series Table of Contents Introduction Installation Documentation Getting Started Anomaly Detection Foreca

Salesforce 2.8k Dec 30, 2022
Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Wav2CLIP 🚧 WIP 🚧 Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP πŸ“„ πŸ”— Ho-Hsiang Wu, Prem Seetharaman

Descript 240 Dec 13, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
[IJCAI'21] Deep Automatic Natural Image Matting

Deep Automatic Natural Image Matting [IJCAI-21] This is the official repository of the paper Deep Automatic Natural Image Matting. Introduction | Netw

Jizhizi_Li 316 Jan 06, 2023
An AutoML Library made with Optuna and PyTorch Lightning

An AutoML Library made with Optuna and PyTorch Lightning Installation Recommended pip install -U gradsflow From source pip install git+https://github.

GradsFlow 294 Dec 17, 2022
MISSFormer: An Effective Medical Image Segmentation Transformer

MISSFormer Code for paper "MISSFormer: An Effective Medical Image Segmentation Transformer". Please read our preprint at the following link: paper_add

Fong 22 Dec 24, 2022
Discriminative Condition-Aware PLDA

DCA-PLDA This repository implements the Discriminative Condition-Aware Backend described in the paper: L. Ferrer, M. McLaren, and N. BrΓΌmmer, "A Speak

Luciana Ferrer 31 Aug 05, 2022