Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Repo público onde postarei meus estudos de Python, buscando aprender por meio do compartilhamento do aprendizado!

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

🛠 All-in-one web-based IDE specialized for machine learning and data science.

RANZCR-CLiP 7th Place Solution

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

State-of-the-art data augmentation search algorithms in PyTorch

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Temporal Knowledge Graph Reasoning Triggered by Memories

Predictive Maintenance LSTM

Rotation Robust Descriptors

Monify: an Expense tracker Program implemented in a Graphical User Interface that allows users to keep track of their expenses

ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

Attention Probe: Vision Transformer Distillation in the Wild

[NeurIPS'21 Spotlight] PyTorch code for our paper "Aligned Structured Sparsity Learning for Efficient Image Super-Resolution"

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

BankNote-Net: Open dataset and encoder model for assistive currency recognition

A Bayesian cognition approach for belief updating of correlation judgement through uncertainty visualizations