Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Last update: Nov 07, 2022

Related tags

Text Data & NLP SpeechMix

Overview

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Related tags

Overview

SpeechMix

Introduction

Speech encoder NLP decoder

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

Speech encoder NLP encoder decoder

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

Speech encoder NLP encoder decoder only fine-tune on speech encoder

Installation

pip install

Build from source

Owner

Eric Lam

Crie tokens de autenticação íntegros e seguros com UToken.

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Main repository for the chatbot Bobotinho.

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

中文生成式预训练模型

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Exploration of BERT-based models on twitter sentiment classifications

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Plugin repository for Macast

Use fastai-v2 with HuggingFace's pretrained transformers

Fastseq 基于ONNXRUNTIME的文本生成加速框架

基于Transformer的单模型、多尺度的VAE模型

Named Entity Recognition API used by TEI Publisher

An Explainable Leaderboard for NLP

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.

Share constant definitions between programming languages and make your constants constant again