A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Last update: Oct 23, 2022

Related tags

Overview

wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the 🤗 HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

Clone your fork to your local disk, and add the base repository as a remote:

git clone [email protected]:<your Github handle>/wav2vec-toolkit.git
cd wav2vec-toolkit
git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git

Create a new branch to hold your development changes:
```
git checkout -b a-descriptive-name-for-my-changes
```
do not work on the master branch.
Set up a development environment by running the following command in a virtual environment:
```
pip install -e ".[dev]"
```
(If wav2vec-toolkit was already installed in the virtual environment, remove it with pip uninstall wav2vec_toolkit before reinstalling it in editable mode with the -e flag.)
Develop the features on your branch.
Format your code. Run black and isort so that your newly added files look nice with the following command:
```
black --line-length 119 --target-version py36 src scripts
isort src scripts
```
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
```
git add .
git commit
```
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
```
git fetch upstream
git rebase upstream/main
```
Push the changes to your account using:
```
git push -u origin a-descriptive-name-for-my-changes
```
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Related tags

Overview

wav2vec-toolkit

How to contribute

Owner

Anton Lozhkov

Making text a first-class citizen in TensorFlow.

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

A python package to fine-tune transformer-based models for named entity recognition (NER).

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

kochat

Header-only C++ HNSW implementation with python bindings

StarGAN - Official PyTorch Implementation

NeurIPS'21: Probabilistic Margins for Instance Reweighting in Adversarial Training (Pytorch implementation).

A Python script which randomly chooses and prints a file from a directory.

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

SimCSE: Simple Contrastive Learning of Sentence Embeddings