LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

The scripts for data preparation are available here.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: TBC !

LeBenchmark: a reproducible framework for assessing SSL from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

PG-19 Language Modelling Benchmark

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

DeepPavlov Tutorials

ConvBERT: Improving BERT with Span-based Dynamic Convolution

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

pytorch implementation of Attention is all you need

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

Mlcode - Continuous ML API Integrations

[ICLR'19] Trellis Networks for Sequence Modeling

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

This library is testing the ethics of language models by using natural adversarial texts.

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training