Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Last update: Dec 27, 2022

Overview

DART

Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.

Environment

[email protected]
Use pip install -r requirements.txt to install dependencies.
wandb account is required if the user wants to search for best hyper-parameter combinations.

Data source

16-shot GLUE dataset from LM-BFF.
Generated data consists of 5 random splits (13/21/42/87/100) for a task, each has 16 samples.

How to run

To run across each 5 splits in a task, use run.py:
- In the arguments, encoder="inner" is the method proposed in the paper where verbalizers are other trainable tokens; encoder="manual" means verbalizers are selected fixed tokens; encoder="lstm" refers to the P-Tuning method.

$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
              [--output_dir_suffix OUTPUT_DIR_SUFFIX]

optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE
  --output_dir_suffix OUTPUT_DIR_SUFFIX, -o OUTPUT_DIR_SUFFIX

To train and evaluate on a single split with details recorded, use inference.py.
- Before running, [task_name, label_list, prompt_type] should be configured in the code.
- prompt_type="none" refers to fixed verbalizer training, while "inner" refers to the method proposed in the paper. ("inner2" is deprecated 2-stage training)
To find optimal hyper-parameters for each task-split and reproduce our result, please use sweep.py:
- Please refer to documentation for WandB for more details.

$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--sweep_id SWEEP_ID]

optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --sweep_id SWEEP_ID

To train and evaluate with more customized configurations, use cli.py.
To analyze and visualize the results come from inference.py, use visualize.py and visualize_word_emb.py.

How to Cite

@article{DBLP:journals/corr/abs-2108-13161,
  author    = {Ningyu Zhang and
               Luoqiu Li and
               Xiang Chen and
               Shumin Deng and
               Zhen Bi and
               Chuanqi Tan and
               Fei Huang and
               Huajun Chen},
  title     = {Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
               Learners},
  journal   = {CoRR},
  volume    = {abs/2108.13161},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.13161},
  eprinttype = {arXiv},
  eprint    = {2108.13161},
  timestamp = {Thu, 13 Jan 2022 17:33:17 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-13161.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Related tags

Overview

DART

Environment

Data source

How to run

How to Cite

Owner

ZJUNLP

Source code for "OmniPhotos: Casual 360° VR Photography"

For visualizing the dair-v2x-i dataset

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

Import Python modules from dicts and JSON formatted documents.

LaneAF: Robust Multi-Lane Detection with Affinity Fields

Self-Supervised Learning

Automatic 2D-to-3D Video Conversion with CNNs

Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

PyoMyo - Python Opensource Myo library

Video-face-extractor - Video face extractor with Python

Uncertain natural language inference

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)