Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Last update: Dec 29, 2022

Overview

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Requirements:

Python 3.7+
Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
Python package requirements: cffi, numpy
Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.

Models:

Model	Download Size
Facebook Wav2Vec2 2.0 Base (960h)	360 MB
Facebook Wav2Vec2 2.0 Large (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h)	1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Comments

provide API for returning output from intermediate layers

It would be very helpful to have an API for returning output from intermediate layers, for example, the one before the final layers. This output can be used in other speech tasks other than speech recognition.

opened by zhouyong64 1

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

6 Oct 18, 2022

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

64 May 8, 2022

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

40 Sep 19, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 9, 2023

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

309 Oct 19, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Related tags

Overview

Wav2Vec2 STT Python

Usage

Installation/Building

Author

License

Acknowledgments

You might also like...

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Comments

provide API for returning output from intermediate layers

Releases(v0.2.0)

v0.2.0(Aug 16, 2021)

models(Aug 2, 2021)

Owner

David Zurow

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Nested Named Entity Recognition

Finetune gpt-2 in google colab

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Fast topic modeling platform

Transformer related optimization, including BERT, GPT

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Score-Based Point Cloud Denoising (ICCV'21)

A Japanese tokenizer based on recurrent neural networks

(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

Guide to using pre-trained large language models of source code

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Chinese Grammatical Error Diagnosis

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

🌐 Translation microservice powered by AI

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"