A PyTorch Implementation of End-to-End Models for Speech-to-Text

Last update: Dec 25, 2022

Related tags

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Related tags

Overview

speech

Install

Run

Examples

Owner

Awni Hannun

History Aware Multimodal Transformer for Vision-and-Language Navigation

vits chinese, tts chinese, tts mandarin

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Fake Shakespearean Text Generator

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Twitter Sentiment Analysis using #tag, words and username

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Using BERT-based models for toxic span detection

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

chaii - hindi & tamil question answering

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).