Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Last update: Nov 25, 2022

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Please cite as:

@inproceedings{liu2020understanding,
  title={Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning},
  author={Liu, Xuebo and Wang, Longyue and Wong, Derek F and Ding, Liang and Chao, Lidia S and Tu, Zhaopeng},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Requirements and Installation

This implementation is based on fairseq(v0.9.0)

PyTorch version >= 1.2.0
Python version >= 3.6

git clone https://github.com/SunbowLiu/SurfaceFusion
cd SurfaceFusion
pip install --editable .

Preprocess

Download WMT16 En-Ro Data (Original)

tar -zxvf wmt16.tar.gz
PATH_TO_RAW_DATA=wmt16/en-ro
PATH_TO_DATA=wmt16/en-ro/data-bin
python preprocess.py \
    --source-lang en --target-lang ro \
    --trainpref $PATH_TO_RAW_DATA/train/corpus.bpe \
    --validpref $PATH_TO_RAW_DATA/dev/dev.bpe \
    --testpref $PATH_TO_RAW_DATA/test/test.bpe \
    --destdir $PATH_TO_DATA \
    --joined-dictionary \
    --workers 20

Train (8 gpus)

OUTPUT=checkpoints
python train.py \
    $PATH_TO_DATA \
    --arch transformer_surface_fusion --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
    --lr 0.0005 --min-lr 1e-09 \
    --dropout 0.3  --weight-decay 0.0 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --save-dir $OUTPUT --seed 333 --ddp-backend=no_c10d --fp16 \
    --max-tokens 2048 --update-freq 1 --max-update 60000 --keep-last-epochs 1 \
    --surfacefusion att --sf-gate 0.8 --sf-mode hard

It is noted that we use 16k batch size, i.e., max-tokens * update-freq * num_of_gpus = 16k.

Evaluation (1 gpu)

python generate.py \
    $PATH_TO_DATA \
    --path $OUTPUT/checkpoint_best.pt \
    --beam 4 --lenpen 1.0 --remove-bpe

The model can gain nearly 35.1 BLEU scores.

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Related tags

Overview

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Citation

Requirements and Installation

Preprocess

Train (8 gpus)

Evaluation (1 gpu)

Owner

Sunbow Liu

[TOG 2021] PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling.

Code for Active Learning at The ImageNet Scale.

Diverse Branch Block: Building a Convolution as an Inception-like Unit

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

OpenVINO黑客松比赛项目

git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Accurate identification of bacteriophages from metagenomic data using Transformer

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

PyTorch implementation of residual gated graph ConvNets, ICLR’18

Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

Exporter for Storage Area Network (SAN)

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Implicit Model Specialization through DAG-based Decentralized Federated Learning

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

When are Iterative GPs Numerically Accurate?

Parallel Latent Tree-Induction for Faster Sequence Encoding