PyTorch Lightning implementation of Automatic Speech Recognition

Last update: Sep 19, 2022

Overview

lasr

Lightening Automatic Speech Recognition

An MIT License ASR research library, built on PyTorch-Lightning, for developing end-to-end ASR models.

Introduction

PyTorch Lightning is the lightweight PyTorch wrapper for high-performance AI research. PyTorch is extremely easy to use to build complex AI models. But once the research gets complicated and things like multi-GPU training, 16-bit precision and TPU training get mixed in, users are likely to introduce bugs. PyTorch Lightning solves exactly this problem. Lightning structures your PyTorch code so it can abstract the details of training. This makes AI research scalable and fast to iterate on.

This project is an example that implements the asr project with PyTorch Lightning. In this project, I trained a model consisting of a conformer encoder + LSTM decoder with Joint CTC-Attention. The lasr means lighthning automatic speech recognition. I hope this could be a guideline for those who research speech recognition.

Installation

This project recommends Python 3.7 or higher.
I recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
librosa: conda install -c conda-forge librosa (Refer here for problem installing librosa)
torchaudio: pip install torchaudio==0.6.0 (Refer here for problem installing torchaudio)
sentencepiece: pip install sentencepiece (Refer here for problem installing sentencepiece)
pytorch-lightning: pip install pytorch-lightning (Refer here for problem installing pytorch-lightning)
hydra: pip install hydra-core --upgrade (Refer here for problem installing hydra)

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the
following commands:

pip install -e .

Install Apex (for 16-bit training)

For faster training install NVIDIA's apex library:

$ git clone https://github.com/NVIDIA/apex
$ cd apex

# ------------------------
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
# depending on how you installed PyTorch

# see available modules
module avail

# load correct CUDA before install
module load cuda-10.0
# ------------------------

# make sure you've loaded a cuda version > 4.0 and < 7.0
module load gcc-6.1.0

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Get Started

I use Hydra to control all the training configurations. If you are not familiar with Hydra we recommend visiting the Hydra website. Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.

Training Speech Recognizer

You can simply train with LibriSpeech dataset like below:

$ python ./bin/main.py --dataset_path $DATASET_PATH --dataset_download True

Check configuraions at [link]

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on Github.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

License

This project is licensed under the MIT LICENSE - see the LICENSE.md file for details

Author

Soohwan Kim @sooftware
Contacts: [email protected]

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

17 Sep 23, 2021

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

4 May 26, 2022

A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

20 Dec 3, 2022

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

5 Nov 11, 2022

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

1k Jan 2, 2023

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022

Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 2, 2023

Comments

incorrect spm params

python prepare_libri.py --dataset_path ../../data/lasr/libri/LibriSpeech --vocab_size 5000
sentencepiece_trainer.cc(177) LOG(INFO) Running command: --input=spm_input.txt --model_prefix=tokenizer --vocab_size=5000 --model_type=unigram --pad_id=0 --bos_id=1 --eos_id=2
sentencepiece_trainer.cc(77) LOG(INFO) Starts training with :
trainer_spec {
  input: spm_input.txt
  input_format:
  model_prefix: tokenizer
  model_type: UNIGRAM
  vocab_size: 5000
  self_test_sample_size: 0
  character_coverage: 0.9995
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  required_chars:
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: 0
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv:
}
denormalizer_spec {}
Traceback (most recent call last):
  File "prepare_libri.py", line 65, in <module>
    main()
  File "prepare_libri.py", line 58, in main
    prepare_tokenizer(transcripts_collection[0], opt.vocab_size)
  File "lasr/dataset/preprocess.py", line 71, in prepare_tokenizer
    spm.SentencePieceTrainer.Train(cmd)
  File "anaconda3/envs/lasr/lib/python3.7/site-packages/sentencepiece/__init__.py", line 407, in Train
    return SentencePieceTrainer._TrainFromString(arg)
  File "anaconda3/envs/lasr/lib/python3.7/site-packages/sentencepiece/__init__.py", line 385, in _TrainFromString
    return _sentencepiece.SentencePieceTrainer__TrainFromString(arg)
RuntimeError: Internal: /home/conda/feedstock_root/build_artifacts/sentencepiece_1612846348604/work/src/trainer_interface.cc(666) [insert_id(trainer_spec_.pad_id(), trainer_spec_.pad_piece())]

opened by szalata 3

Releases(v0.1)

v0.1(May 9, 2021)
Fix several bugs in v0.0

pull request 28~53

Apply structured configuraions (using Hydra)

Source code(tar.gz)
Source code(zip)
v0.0(May 8, 2021)
First release

Data download & processing test complete

Source code(tar.gz)
Source code(zip)

Owner

Soohwan Kim

Toward human-like AI

GitHub Repository https://sooftware.github.io/lasr/

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

208 Dec 14, 2022

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

3.3k Jan 04, 2023

A Framework for Encrypted Machine Learning in TensorFlow

TF Encrypted is a framework for encrypted machine learning in TensorFlow. It looks and feels like TensorFlow, taking advantage of the ease-of-use of t

0 Jul 06, 2022

The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

1.3k Dec 31, 2022

A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran

26 Nov 19, 2022

Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Jarvis_proyect Jarvis Project is a basic virtual assistant that uses TensorFlow for learning. Latest version 0.1 Features: Good morning protocol Tell

3 Aug 31, 2022

Learning Saliency Propagation for Semi-supervised Instance Segmentation

Learning Saliency Propagation for Semi-supervised Instance Segmentation PyTorch Implementation This repository contains: the PyTorch implementation of

68 Oct 18, 2022

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss This repository contains the TensorFlow implementation of the paper UnF

270 Nov 06, 2022

SEJE Pytorch implementation

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

37 Dec 04, 2022

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

78 Dec 29, 2022

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

RedliningExploration The Google Colaboratory file contained in this repository contains work inspired by a project on educational inequality in the Ph

1 Jan 20, 2022

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

PyTorch Lightning implementation of Automatic Speech Recognition

Related tags

Overview

lasr

Introduction

Installation

Prerequisites

Install from source

Install Apex (for 16-bit training)

Get Started

Training Speech Recognizer

Troubleshoots and Contributing

Code Style

License

Author

You might also like...

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

A simple, unofficial implementation of MAE using pytorch-lightning

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Pytorch Lightning code guideline for conferences

Pytorch Lightning Distributed Accelerators using Ray

Pytorch Lightning Distributed Accelerators using Ray

Comments

incorrect spm params

Releases(v0.1)

v0.1(May 9, 2021)

v0.0(May 8, 2021)

Owner

Soohwan Kim

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

A Framework for Encrypted Machine Learning in TensorFlow

The Python ensemble sampling toolkit for affine-invariant MCMC

A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Learning Saliency Propagation for Semi-supervised Instance Segmentation

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

SEJE Pytorch implementation

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Multi-Scale Progressive Fusion Network for Single Image Deraining

Malware Analysis Neural Network project.

Virtual hand gesture mouse using a webcam

Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Tiny-NewsRec: Efﬁcient and Effective PLM-based News Recommendation