Few-shot Learning of GPT-3

Overview

Few-shot Learning With Language Models

This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper. In particular, a few training examples are placed into a natural language "prompt" and predictions are made by generating from the language model. See the GPT-3 paper and Calibrate Before Use for more information.

You can run this codebase with GPT-3 (if you have a key from OpenAI), GPT-2, and any other language model available in HuggingFace Transformers. If you have a GPT-3 key, you should place your API key into a file named openai_key.txt. The underlying model you use is abstracted away using a common API.

Running this codebase will report results with and without contextual calibration.

Dependencies

This code is written using PyTorch and HuggingFace's Transformer repo. If you are running a model locally (e.g., GPT-2), the code requires a single GPU. Running these experiments is relatively lightweight (there is no training), so a single GPU is sufficient. It is technically possible to run the experiments without a GPU, but the runtime will be slow.

Installation

The easiest way to install the code is to create a fresh anaconda environment:

conda create -n fewshot python=3.6
source activate fewshot
pip install -r requirements.txt

Now you should be ready to go!

Replicating Our Results

Here is how to replicate the results from our paper for GPT-2. To replicate the results for classification tasks:

CUDA_VISIBLE_DEVICES=0 python run_classification.py \
--model="gpt2-xl" \
--dataset="sst2, trec, cb, agnews, dbpedia" \
--num_seeds=5 \
--all_shots="0, 1, 4, 8" \
--subsample_test_set=300 \
--approx

To replicate the results for extraction tasks:

CUDA_VISIBLE_DEVICES=0 python run_extraction.py \
--model="gpt2-xl" \
--dataset="mit_movie_Genre, mit_movie_Director, atis_airline_name, atis_depart_date.day_name" \
--num_seeds=5 \
--all_shots="0, 1, 4, 8" \
--subsample_test_set=300

To replicate the results for LAMA:

CUDA_VISIBLE_DEVICES=0 python run_lama.py

Note that after we refactored our code, the training sets are not the same ones used in our results table. We expect the results to differ slightly but they should match the same trends seen in our results.

Overview of Codebase

Data

The data folder contains the raw data for numerous tasks. If you'd like to add your own task, add the data into that folder. The code for loading a dataset, as well as defining the prompt format for a task, is in utils/data_utils.py. We have loaders for a wide range of existing datasets. If you want to add a new dataset that is similar in structure to any of the existing datasets (e.g., its text classification) adding it should be very simple---you can use an existing dataset as a guide.

Utils

The utils folder contains all of the code for calling the underlying models, getting the probabilities of each label token, possibly applying contextual calibration, and more. If you just want to evaluate few-shot learning on your task, you should not need to modify this code. If you want to extend our code (e.g., modify how decisions are made) this is the place to look.

Run Scripts

The run scripts, e.g., run_classification.py, contain the code for randomly sampling the examples to use in the prompt, calling the models, the necessary evaluation metrics, and more. If you are adding a new task format (one that is not classification, QA) then you will need to write your own run script. Inside the run script, you can set the parameters for the experiments using the command line arguments.

For all experiments, we save and pickle the outputs of the model. This makes doing a post-hoc analysis of the accuracy / plotting results / etc. very fast. You can also use the saved outputs to evaluate how the accuracy would have changed if a different decision making function was used (e.g., accuracy with and without contextual calibration).

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@article{Zhao2021Calibrate,	
  Author = {Tony Z. Zhao and Eric Wallace and Shi Feng and Dan Klein and Sameer Singh},	
  Journal={arXiv preprint arXiv:2102.09690},	
  Year = {2021},	
  Title = {Calibrate Before Use: Improving Few-shot Performance of Language Models}	
}    	

Contributions and Contact

This code was developed by Tony Z. Zhao and Eric Wallace, contact available at [email protected] and [email protected].

If you'd like to contribute code, feel free to open a pull request. If you find an issue, please open an issue.

Owner
Tony Z. Zhao
UC Berkeley EECS, working on robotics, NLP and ML
Tony Z. Zhao
Object Tracking and Detection Using OpenCV

Object tracking is one such application of computer vision where an object is detected in a video, otherwise interpreted as a set of frames, and the object’s trajectory is estimated. For instance, yo

Happy N. Monday 4 Aug 21, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 463 Dec 09, 2022
Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

Semi-supervised Transfer Learning for Image Rain Removal This package contains the Python implementation of "Semi-supervised Transfer Learning for Ima

Wei Wei 59 Dec 26, 2022
TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

TART This project is a PyTorch implementation for Transition Matrix Representati

Lee Sael 2 Jan 19, 2022
Transformers are Graph Neural Networks!

🚀 Gated Graph Transformers Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression. Associated article

Chaitanya Joshi 46 Jun 30, 2022
A spherical CNN for weather forecasting

DeepSphere-Weather - Deep Learning on the sphere for weather/climate applications. The code in this repository provides a scalable and flexible framew

DeepSphere 47 Dec 25, 2022
Extreme Dynamic Classifier Chains - XGBoost for Multi-label Classification

Extreme Dynamic Classifier Chains Classifier chains is a key technique in multi-label classification, sinceit allows to consider label dependencies ef

6 Oct 08, 2022
Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks

This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset and my original methods that are publi

Shunta Saito 255 Sep 07, 2022
Unofficial PyTorch implementation of SimCLR by Google Brain

Unofficial PyTorch implementation of SimCLR by Google Brain

Rishabh Anand 2 Oct 13, 2021
MADT: Offline Pre-trained Multi-Agent Decision Transformer

MADT: Offline Pre-trained Multi-Agent Decision Transformer A link to our paper can be found on Arxiv. Overview Official codebase for Offline Pre-train

Linghui Meng 51 Dec 21, 2022
Moment-DETR code and QVHighlights dataset

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Gesture Controlled Video Game Detailed Blog : https://www.analyticsvidhya.com/blog/2021/06/gesture-controlled-video-game/ Introduction This project is

Devbrat Anuragi 35 Jan 06, 2023
Huawei Hackathon 2021 - Sweden (Stockholm)

huawei-hackathon-2021 Contributors DrakeAxelrod Challenge Requirements: python=3.8.10 Standard libraries (no importing) Important factors: Data depend

Drake Axelrod 32 Nov 08, 2022
Training DiffWave using variational method from Variational Diffusion Models.

Variational DiffWave Training DiffWave using variational method from Variational Diffusion Models. Quick Start python train_distributed.py discrete_10

Chin-Yun Yu 26 Dec 13, 2022
Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

Machine Learning and Data Mining, Summer 2021-2022 How to learn data science and machine learning? Programming. Learn Python. Basic Statistics. Take a

Simeon Hristov 8 Oct 04, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
The implementation of DeBERTa

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 06, 2023
Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

Dahyun Kang 4 May 28, 2022
Bootstrapped Representation Learning on Graphs

Bootstrapped Representation Learning on Graphs This is the PyTorch implementation of BGRL Bootstrapped Representation Learning on Graphs The main scri

NerDS Lab :: Neural Data Science Lab 55 Jan 07, 2023