Complete the code of prefix-tuning in low data setting

Last update: Jul 11, 2022

Related tags

Overview

Prefix Tuning

Note:

作者在论文中提到使用真实的word去初始化prefix的操作（Initializing the prefix with activations of real words，significantly improves generation）。我在使用作者提供的代码时遇到了一些问题，因此按照代码的思路添加了利用真实词汇进行初始化的内容。

可以采用以下的方式运行：

Train

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800 --use_lowdata_token 'yes' --lowdata_token 'summarize'

其中use_lowdata_token表示是否采用real word初始化的方式；lowdata_token表示传入的real word.

Decode

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training} --use_lowdata_token 'yes' --lowdata_token 'summarize'

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)

Setup:

cd transformer; pip install -e .

Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Complete the code of prefix-tuning in low data setting

Related tags

Overview

Prefix Tuning

Note:

Train

Decode

Files:

Setup:

Train via prefix-tuning:

Decode:

Owner

Andrew Zeng

Transfer Learning Remote Sensing

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Aligning Latent and Image Spaces to Connect the Unconnectable

wmctrl ported to Python Ctypes

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

SysWhispers Shellcode Loader

Code for "Unsupervised State Representation Learning in Atari"

A python module for configuration of block devices

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Camera-caps - Examine the camera capabilities for V4l2 cameras

DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

PyTorch implementation of "Optimization Planning for 3D ConvNets"

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

AOT (Associating Objects with Transformers) in PyTorch

Invasive Plant Species Identification

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021