Romanian Automatic Speech Recognition from the ROBIN project

Last update: Jan 01, 2023

Overview

RobinASR

This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.

The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.

Also, make sure to visit:

A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection

Installation

Docker

Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a models directory inside this repository.
Build the docker image using the Dockerfile. Make sure that deepspeech_pytorch/configs/inference_config.py has the desired configuration.

docker build --tag RobinASR .

Run the docker image.

docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR

From Source

You must have Python 3.6+ and PyTorch 1.5.1+ installed in your system. Also. Cuda 10.1+ is required if you want to use the (recommended) GPU version.
Clone the repository and install its dependencies:

git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .

Install Nvidia Apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want to use Beam Search and the KenLM language model, you must install CTCDecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Inference Server

Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py and make sure that the configuration meets your requirements. Then, run the following command:

python3 server.py

Train a New Model

You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:

path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...

Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py and start training with:

python train.py

Acknowledgments

We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.

Cite

If you are using this repository, please cite the following paper as a thank you to the authors:

Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).

or in BibTeX format:

@inproceedings{avram2020towards,
  title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
  author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
  booktitle={Proceedings of the Romanian Academy, Series A},
  pages={395--402},
  year={2020}
}

Romanian Automatic Speech Recognition from the ROBIN project

Related tags

Overview

RobinASR

Installation

Docker

From Source

Inference Server

Train a New Model

Acknowledgments

Cite

Owner

RACAI

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

Perform zero-order Hankel Transform for an 1D array (float or real valued).

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

Single-Stage 6D Object Pose Estimation, CVPR 2020

Python-based Informatics Kit for Analysing Chemical Units

[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

Multi-agent reinforcement learning algorithm and environment

DIVeR: Deterministic Integration for Volume Rendering

An imperfect information game is a type of game with asymmetric information

Deep learning library featuring a higher-level API for TensorFlow.

Let's Git - Versionsverwaltung & Open Source Hausaufgabe

The MATH Dataset

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

Simulator for FRC 2022 challenge: Rapid React

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.