Multistream Convolutional Neural Network (CNN)

A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. It processes input speech with diverse resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rate of 3 are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer, as illustrated below:

References

Multistream CNN for Robust Acoustic Modeling [paper]

{
  @inproceedings{han2021multistream-cnn,
    title={Multistream CNN for Robust Acoustic Modeling},
    author={Kyu J. Han and Jing Pan and Venkata Krishna Naveen Tadala and Tao Ma and Dan Povey},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2021}
}

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [paper]

{
  @inproceedings{pan2020asapp-asr,
    title={ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition},
    author={Jing Pan and Joshua Shapiro and Jeremy Wohlwend and Kyu J. Han and Tao Lei and Tao Ma},
    booktitle={Interspeech},
    year={2020}
}

Installation

Please follow the original Kaldi build sequence, as below.

>> cd tools; make; cd ../src; ./configure; make clean; make -j clean depend; make -j all

Recipes and Results

LibriSpeech

>> egs/librispeech/s5/local/chain/run_multistream_cnn_1a.sh

	dev-clean	dev-other	test-clean	test-other
tdnn_1d	3.29	8.71	3.80	8.76
multistream_cnn_1a	3.20	7.68	3.54	7.87

Fisher-SWBD

>> egs/fisher_swbd/s5/local/chain/run_multistream_cnn_1a.sh

	eval2000	swbd	callhm
tdnn_7d	12.6	8.8	16.3
multistream_cnn_1a	12.6	9.2	15.7

Multistream CNN for Robust Acoustic Modeling

Related tags

Overview

Multistream Convolutional Neural Network (CNN)

References

Installation

Recipes and Results

Owner

ASAPP Research

A program to recognize fruits on pictures or videos using yolov5

A model to classify a piece of news as REAL or FAKE

Learning Open-World Object Proposals without Learning to Classify

Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

This Deep Learning Model Predicts that from which disease you are suffering.

Geometric Vector Perceptron --- a rotation-equivariant GNN for learning from biomolecular structure

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

Multi-resolution SeqMatch based long-term Place Recognition

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

Learning Confidence for Out-of-Distribution Detection in Neural Networks

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution