Implementation of ICLR 2020 paper "Revisiting Self-Training for Neural Sequence Generation"

Overview

Self-Training for Neural Sequence Generation

This repo includes instructions for running noisy self-training algorithms from the following paper:

Revisiting Self-Training for Neural Sequence Generation
Junxian He*, Jiatao Gu*, Jiajun Shen, Marc'Aurelio Ranzato
ICLR 2020

Requirement

  • fairseq (please see the fairseq repo for other requirements on Python and PyTorch versions)

fairseq can be installed with:

pip install fairseq

Data

Download and preprocess the WMT'14 En-De dataset:

# Download and prepare the data
wget https://raw.githubusercontent.com/pytorch/fairseq/master/examples/translation/prepare-wmt14en2de.sh
bash prepare-wmt14en2de.sh --icml17

TEXT=wmt14_en_de
fairseq-preprocess --source-lang en --target-lang de \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir wmt14_en_de_bin --thresholdtgt 0 --thresholdsrc 0 \
    --joined-dictionary --workers 16

Then we mimic a semi-supervised setting where 100K training samples are randomly selected as parallel corpus and the remaining English training samples are treated as unannotated monolingual corpus:

bash extract_wmt100k.sh

Preprocess WMT100K:

bash preprocess.sh 100ken 100kde 

Add noise to the monolingual corpus for later usage:

TEXT=wmt14_en_de
python paraphrase/paraphrase.py \
  --paraphraze-fn noise_bpe \
  --word-dropout 0.2 \
  --word-blank 0.2 \
  --word-shuffle 3 \
  --data-file ${TEXT}/train.mono_en \
  --output ${TEXT}/train.mono_en_noise \
  --bpe-type subword

Train the base supervised model

Train the translation model with 30K updates:

bash supervised_train.sh 100ken 100kde 30000

Self-training as pseudo-training + fine-tuning

Translate the monolingual data to train.[suffix] to form a pseudo parallel dataset:

bash translate.sh [model_path] [suffix]  

Suppose the pseduo target language suffix is mono_de_iter1 (by default), preprocess:

bash preprocess.sh mono_en_noise mono_de_iter1

Pseudo-training + fine-tuning:

bash self_train.sh mono_en_noise mono_de_iter1 

The above command trains the model on the pseduo parallel corpus formed by train.mono_en_noise and train.mono_de_iter1 and then fine-tune it on real parallel data.

This self-training process can be repeated for multiple iterations to improve performance.

Reference

@inproceedings{He2020Revisiting,
title={Revisiting Self-Training for Neural Sequence Generation},
author={Junxian He and Jiatao Gu and Jiajun Shen and Marc'Aurelio Ranzato},
booktitle={Proceedings of ICLR},
year={2020},
url={https://openreview.net/forum?id=SJgdnAVKDH}
}
Owner
Junxian He
NLP/ML PhD student at CMU
Junxian He
Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Decoupled Spatial-Temporal Graph Neural Networks Code for our paper: Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting.

S22 43 Jan 04, 2023
Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Diverse Object-Scene Compositions For Zero-Shot Action Recognition This repository contains the source code for the use of object-scene compositions f

7 Sep 21, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN This is the PyTorch implementation of StyleGAN of All Trades: Image Manipulati

360 Dec 28, 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022
Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

M-LSD: Towards Light-weight and Real-time Line Segment Detection Official Tensorflow implementation of "M-LSD: Towards Light-weight and Real-time Line

NAVER/LINE Vision 357 Jan 04, 2023
Create time-series datacubes for supervised machine learning with ICEYE SAR images.

ICEcube is a Python library intended to help organize SAR images and annotations for supervised machine learning applications. The library generates m

ICEYE Ltd 65 Jan 03, 2023
Weighted QMIX: Expanding Monotonic Value Function Factorisation

This repo contains the cleaned-up code that was used in "Weighted QMIX: Expanding Monotonic Value Function Factorisation"

whirl 82 Dec 29, 2022
Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

Linda MacPhee-Cobb 386 Jan 03, 2023
STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

STARCH (Storm Tracking And Regional CHaracterization) STARCH computes regional extreme storm physical and moisture balance characteristics based on sp

Onosama 7 Oct 20, 2022
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

CGPN This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels]. Req

10 Sep 12, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

Dictionary Learning for Clustering on Hyperspectral Images Overview Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionari

Joshua Bruton 6 Oct 25, 2022
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 159 Dec 20, 2022
Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN (AU-GAN) Official Tensorflow implementation of Adverse Weather Image Trans

Jeong-gi Kwak 36 Dec 26, 2022