Versatile Generative Language Model

Last update: Dec 02, 2022

Overview

Versatile Generative Language Model

This is the implementation of the paper:

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning. Zhaojiang Lin, Andrea Madotto, Pascale Fung Findings of EMNLP 2020 [PDF]

If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@article{lin2020exploring,
  title={Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning},
  author={Lin, Zhaojiang and Madotto, Andrea and Fung, Pascale},
  journal={arXiv preprint arXiv:2004.03829},
  year={2020}
}

Abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks have shown promising results. However, it comes with the cost of having a single, large, model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this work, we propose an effective way for fine-tuning multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments in five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

Versatile Generative Language Model (VLM):

Versatile Language Model (VLM) is composed of three components: a pre-trained language model back-bone (e.g., GPT-2), and two kinds of specialized parameters for each generation task such as low-rank residual adapters and task embeddings.

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Experiments

Dataset

Download the preprocessed datasets

Reproducibility

We provide the trained checkpoint of our VLM.

Test model: choose one task from (mt, summarization, dialogue, qa, nlg].

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

Fine tune GPT-2

Train machine translation:

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json

Test machine translation:

❱❱❱ python ./evaluate.py --task mt --no_sample --max_history=2 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

VLM train Adapters and Task embeddings

Train machine translation without knowledge distillation

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005

Train machine translation using sentence level knowledge distillation:

❱❱❱ python ./sentence_distiller.py --task mt --max_history=2 --model_checkpoint runs/$fully_finetuned_gpt2_checkpoint --no_sample

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005 --distillation

Test machine traslation:

❱❱❱ python ./evaluate.py --task mt --no_sample --adapter_bottleneck 300 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

Combine all the adapters and task embedding into single model

Line 68 of combine_all.py to provide the list of checkpoint

❱❱❱ python combine_all.py

Test to see if the result is same

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

The above scripts illustrate how to train VLM continuously when tasks arrive sequentially.

Multitask training VLM

When all the tasks available at the same time.

❱❱❱ python ./train_vlm.py --gradient_accumulation_steps=16 --train_batch_size=1 --valid_batch_size=1 --n_epochs 3

Acknowledgement

This repository is implemented base on Huggingface

Versatile Generative Language Model

Related tags

Overview

Versatile Generative Language Model

Abstract

Versatile Generative Language Model (VLM):

Dependency

Experiments

Acknowledgement

Owner

Zhaojiang Lin

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Robot Servers and Server Manager software for robo-gym

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

Code for "Learning to Regrasp by Learning to Place"

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Music library streaming app written in Flask & VueJS

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

StyleGAN2 Webtoon / Anime Style Toonify

In this project, two programs can help you take full agvantage of time on the model training with a remote server

Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

ObjDetApp deploys a pytorch model for object detection

Age Progression/Regression by Conditional Adversarial Autoencoder

Jaxtorch (a jax nn library)

Versatile Generative Language Model

Related tags

Overview

Versatile Generative Language Model

Abstract

Versatile Generative Language Model (VLM):

Dependency

Experiments

Acknowledgement

Owner

Zhaojiang Lin

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Robot Servers and Server Manager software for robo-gym

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

Code for "Learning to Regrasp by Learning to Place"

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Music library streaming app written in Flask & VueJS

Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

StyleGAN2 Webtoon / Anime Style Toonify

In this project, two programs can help you take full agvantage of time on the model training with a remote server

Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

*ObjDetApp* deploys a pytorch model for object detection

Age Progression/Regression by Conditional Adversarial Autoencoder

Jaxtorch (a jax nn library)

ObjDetApp deploys a pytorch model for object detection