Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Related tags

Text Data & NLPBPT
Overview

BP-Transformer

This repo contains the code for our paper

BP-Transformer: Modeling Long-Range Context via Binary Partition

Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang

The code is written in DGL with PyTorch as backend.

Requirements

  • torchtext 0.4
  • dgl 0.4 (the code on master branch is not compatible with dgl 0.5, please checkout develop branch for dgl 0.5 compatible version).
  • yaml
  • spacy
  • PyTorch 1.1+

Usage

For Multi-GPU training, please export NCCL_LL_THRESHOLD=0 before running scripts because of a PyTorch bug mentioned here.

The codebase has two dependencies: graph_kernel and graph_builder, the first one is for efficient graph attention on GPU with node parallel strategy written in CUDA, the second one is for efficient graph construction written in Cython. To install them:

cd graph_builder
python setup.py install
cd ..
cd graph_kernel
python setup.py install
cd ..

We support the following tasks with BPT as backbone:

  • Text Classification: text_classification.py
  • Language Modeling: lm.py
  • Machine Translation: mt.py
  • Natural Language Inference: nli.py

All experiment settings mentioned in our paper are available at configs/.

python *.py --config configs/*.yml --gpu [GPUs]

Note that this repo does not contain any data files, to get dataset required for experiments, run . get_*.sh and the corresponding dataset would be downloaded and preprocessed.

For machine translation, we have another script mt_infer.py for decoding:

python mt_infer.py --config configs/*.yml --gpu [GPU]

Before decoding, please make sure you have finished the training using mt.py with the same config file.

NOTE: Currently we do not support CPU training/inference.

Visualization

Following is the visualization of the sparse matrix of BPT underlying graph when sequence length is 8192 and k is 4. image

Results

  • Character-Level Language Modeling (enwik8, metric: bpc), 12 layers.
    • BPT(context length=8192): 1.02
    • Adaptive Transformer: 1.02
    • Transformer-XL: 1.06
    • To reproduce: python lm.py --config configs/enwik8-8192.yml --gpu 0,1,2,3,4,5,6,7
  • Document-Level Machine Translation (IWSLT 2015 Zh-En, metric: BLEU), base setting.
    • BPT(context length=64): 19.84
    • HAN-NMT: 17.68
    • To reproduce: python mt.py --config configs/iwslt-4-64.yml --gpu 0
  • Text Classification (IMDB, metric: accuracy), 5 layers.
    • BPT+GloVe: 92.12(±0.11)
    • LSTM+CoVe: 91.8
    • Transformer+Glove: 89.24(±0.20)
    • Star Transformer: 90.50
    • To reproduce: python text_classification.py --config configs/imdb-4.yml --gpu 0
      • Note that our CUDA kernel uses atomic operations which may result in non-determinism, we report the mean and std of accuracy in multiple(10) runs.
      • The IMDB dataset has not official train/dev split, we follow the setting of Bryan et al., 2017 and hold out 10% samples for validation. We report the test accuracy of model with best valid loss.

For sentence level modeling, we show that BPT models better inductive bias than vanilla transformer by attending fine-grained features of neighbors and coarse-grained features of far-away tokens.

  • Machine Translation(WMT14 En-De, metric: BLEU), base setting.
    • BPT(k=1): 26.9
    • BPT(k=2): 27.4
    • BPT(k=4): 27.6
    • BPT(k=8): 26.7
    • Transformer-base(our implementation): 27.2
    • To reproduce: python mt.py --config configs/wmt-*.yml --gpu 0,1,2,3,4,5,6,7
      • We report SacreBLEU result for reproducibility (setting: BLEU+c.mixed+l.en-de+#.1+s.exp+t.wmt14+tok.intl+v.1.4.1), the sacrebleu score is usually lower than that produced by get_ende_bleu.sh script in tensor2tensor as described here.
  • Natural Language Inference(SNLI, metric: accuracy), ESIM-like structure, 3 layers for self-attention and 3 layers for cross-sentence attention.
    • BPT(k=4): 88.25(±0.07)
    • Transformer: 87.89(±0.31)
    • To reproduce: python nli.py --config configs/snli.yml --gpu 0
      • Like Text Classification, the result on NLI is also not stable because of randomness in our CUDA kernel, we report the mean and std of accuracy in multiple(7) runs.
  • Text Classification(SST-5, metric: accuracy), 4 layers.
    • BPT+GloVe: 52.71(±0.32)
    • Transformer+GloVe: 50.40
    • Tree-LSTM+GloVe: 51.0
    • To reproduce: python text_classification.py --config configs/sst5-2.yml --gpu 0

TODOs

  • FP16 support (mixed-precision training/inference)
  • Integrate kernels with dgl 0.5
  • CPU support
Owner
Zihao Ye
Ph.D. [email protected] of Washington, focusing on Compilers and Computer Arch
Zihao Ye
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Introduction Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization,

Sai Himal Allu 1 Apr 25, 2022
The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

Deepangshi 1 Apr 03, 2022
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022
UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

Microsoft 281 Dec 15, 2022
An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

AI2 11.4k Jan 01, 2023
Lattice methods in TensorFlow

TensorFlow Lattice TensorFlow Lattice is a library that implements constrained and interpretable lattice based models. It is an implementation of Mono

504 Dec 20, 2022
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

Rethinking the Truly Unsupervised Image-to-Image Translation (ICCV 2021) Each image is generated with the source image in the left and the average sty

Clova AI Research 436 Dec 27, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统

AmorTX 12 Dec 14, 2022
Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

transformers-into-vaes Code for Finetuning Pretrained Transformers into Variational Autoencoders (our submission to NLP Insights Workshop 2021). Gathe

Seongmin Park 22 Nov 26, 2022
This is a NLP based project to extract effective date of the contract from their text files.

Date-Extraction-from-Contracts This is a NLP based project to extract effective date of the contract from their text files. Problem statement This is

Sambhav Garg 1 Jan 26, 2022
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
Pangu-Alpha for Transformers

Pangu-Alpha for Transformers Usage Download MindSpore FP32 weights for GPU from here to data/Pangu-alpha_2.6B.ckpt Activate MindSpore environment and

One 5 Oct 01, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 03, 2023
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

TextAttack 🐙 Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack

QData 2.2k Jan 03, 2023
Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

efficient-task-transfer This repository contains code for the experiments in our paper "What to Pre-Train on? Efficient Intermediate Task Selection".

AdapterHub 26 Dec 24, 2022