Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Overview

Cross-Attention Transfer for Machine Translation

This repo hosts the code to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021.

Setup

We provide our scripts and modifications to Fairseq. In this section, we describe how to go about running the code and, for instance, reproduce Table 2 in the paper.

Data

To view the data as we prepared and used it, switch to the main branch. But we recommend cloning code from this branch to avoid downloading a large amount of data at once. You can always obtain any data as necessary from the main branch.

Installations

We worked in a conda environment with Python 3.8.

  • First install the requirements.
      pip install requirements.txt
  • Then install Fairseq. To have the option to modify the package, install it in editable mode.
      cd fairseq-modified
      pip install -e .
  • Finally, set the following environment variable.
      export FAIRSEQ=$PWD
      cd ..

Experiments

For the purpose of this walk-through, we assume we want to train a De–En model, using the following data:

De-En
├── iwslt13.test.de
├── iwslt13.test.en
├── iwslt13.test.tok.de
├── iwslt13.test.tok.en
├── iwslt15.tune.de
├── iwslt15.tune.en
├── iwslt15.tune.tok.de
├── iwslt15.tune.tok.en
├── iwslt16.train.de
├── iwslt16.train.en
├── iwslt16.train.tok.de
└── iwslt16.train.tok.en

by transferring from a Fr–En parent model, the experiment files of which is stored under FrEn/checkpoints.

  • Start by making an experiment folder and preprocessing the data.
      mkdir test_exp
      ./xattn-transfer-for-mt/scripts/data_preprocessing/prepare_bi.sh \
          de en test_exp/ \
          De-En/iwslt16.train.tok De-En/iwslt15.tune.tok De-En/iwslt13.test.tok \
          8000
    Please note that prepare_bi.sh is written for the most general case, where you are learning vocabulary for both the source and target sides. When necessary modify it, and reuse whatever vocabulary you want. In this case, e.g., since we are transferring from Fr–En to De–En, we will reuse the target side vocabulary from the parent. So 8000 refers to the source vocabulary size, and we need to copy parent target vocabulary instead of learning one in the script.
      cp ./FrEn/data/tgt.sentencepiece.bpe.model $DATA
      cp ./FrEn/data/tgt.sentencepiece.bpe.vocab $DATA
  • Now you can run an experiment. Here we want to just update the source embeddings and the cross-attention. So we run the corresponding script. Script names are self-explanatory. Set the correct path to the desired parent model checkpoint in the script, and:
      bash ./xattn-transfer-for-mt/scripts/training/reinit-src-embeddings-and-finetune-parent-model-on-translation_src+xattn.sh \
          test_exp/ de en
  • Finally, after training, evaluate your model. Set the correct path to the detokenizer that you use in the script, and:
      bash ./xattn-transfer-for-mt/scripts/evaluation/decode_and_score_valid_and_test.sh \
          test_exp/ de en \
          $PWD/De-En/iwslt15.tune.en $PWD/De-En/iwslt13.test.en

Issues

Please contact us and report any problems you might face through the issues tab of the repo. Thanks in advance for helping us improve the repo!

Credits

The main body of code is built upon Fairseq. We found it very easy to navigate and modify. Kudos to the developers!
The data preprocessing scripts are adopted from FLORES scripts.
To have mBART fit on the GPUs that we worked with memory-wise, we used the trimming solution provided here.

Citation

@inproceedings{gheini-cross-attention,
  title = "Cross-Attention is All You Need: {A}dapting Pretrained {T}ransformers for Machine Translation",
  author = "Gheini, Mozhdeh and Ren, Xiang and May, Jonathan",
  booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2021"
}
Owner
Mozhdeh Gheini
Computer Science Ph.D. Student at the University of Southern California
Mozhdeh Gheini
Self-Supervised Image Denoising via Iterative Data Refinement

Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S

Zhang Yi 72 Jan 01, 2023
A Real-ESRGAN equipped Colab notebook for CLIP Guided Diffusion

#360Diffusion automatically upscales your CLIP Guided Diffusion outputs using Real-ESRGAN. Latest Update: Alpha 1.61 [Main Branch] - 01/11/22 Layout a

78 Nov 02, 2022
[Link]mareteutral - pars tradg wth M []

pairs-trading-with-ML Jonathan Larkin, August 2017 One popular strategy classification is Pairs Trading. Though this category of strategies can exhibi

Jonathan Larkin 134 Jan 06, 2023
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

Simon Jenni 46 Nov 14, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 01, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
Fast SHAP value computation for interpreting tree-based models

FastTreeSHAP FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 X

LinkedIn 369 Jan 04, 2023
Dyalog-apl-docset - Dyalog APL Dash Docset Generator

Dyalog APL Dash Docset Generator o alasa e kili sona kepeken tenpo lili a A Dash

Maciej Goszczycki 1 Jan 10, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 828 Dec 28, 2022
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 01, 2023
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien

33 Dec 29, 2022
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
这是一个yolox-keras的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Keras当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤 Ho

Bubbliiiing 64 Nov 10, 2022
Pytorch implementation of Compressive Transformers, from Deepmind

Compressive Transformer in Pytorch Pytorch implementation of Compressive Transformers, a variant of Transformer-XL with compressed memory for long-ran

Phil Wang 118 Dec 01, 2022
《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification》(AAAI 2021) GitHub:

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification

76 Dec 05, 2022
wlad 2 Dec 19, 2022
BboxToolkit is a tiny library of special bounding boxes.

BboxToolkit is a light codebase collecting some practical functions for the special-shape detection, such as oriented detection

jbwang1997 73 Jan 01, 2023
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

David Brüggemann 35 Dec 05, 2022