ACL'22: Structured Pruning Learns Compact and Accurate Models

Overview

CoFiPruning: Structured Pruning Learns Compact and Accurate Models

This repository contains the code and pruned models for our ACL'22 paper Structured Pruning Learns Compact and Accurate Models.

**************************** Updates ****************************

  • 05/09/2022: We release the pruned model checkpoints on RTE, MRPC and CoLA!
  • 04/01/2022: We released our paper along with pruned model checkpoints on SQuAD, SST-2, QNLI and MNLI. Check it out!

Quick Links

Overview

We propose CoFiPruning, a task-specific, structured pruning approach (Coarse and Fine-grained Pruning) and show that structured pruning can achieve highly compact subnetworks and obtain large speedups and competitive accuracy as distillation approaches, while requiring much less computation. Our key insight is to jointly prune coarse-grained units (e.g., self-attention or feed-forward layers) and fine-grained units (e.g., heads, hidden dimensions) simultaneously. Different from existing works, our approach controls the pruning decision of every single parameter by multiple masks of different granularity. This is the key to large compression, as it allows the greatest flexibility of pruned structures and eases the optimization compared to only pruning small units. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization.

Main Results

We show the main results of CoFiPruning along with results of popular pruning and distillation methods including Block Pruning, DynaBERT, DistilBERT and TinyBERT. Please see more detailed results in our paper.

Model List

Our released models are listed as following. You can download these models with the following links. We use a batch size of 128 and V100 32GB GPUs for speedup evaluation. We show F1 score for SQuAD and accuracy score for GLUE datasets. s60 denotes that the sparsity of the model is roughly 60%.

model name task sparsity speedup score
princeton-nlp/CoFi-MNLI-s60 MNLI 60.2% 2.1 × 85.3
princeton-nlp/CoFi-MNLI-s95 MNLI 94.3% 12.1 × 80.6
princeton-nlp/CoFi-QNLI-s60 QNLI 60.3% 2.1 × 91.8
princeton-nlp/CoFi-QNLI-s95 QNLI 94.5% 12.1 × 86.1
princeton-nlp/CoFi-SST2-s60 SST-2 60.1% 2.1 × 93.0
princeton-nlp/CoFi-SST2-s95 SST-2 94.5% 12.2 × 90.4
princeton-nlp/CoFi-SQuAD-s60 SQuAD 59.8% 2.0 × 89.1
princeton-nlp/CoFi-SQuAD-s93 SQuAD 92.4% 8.7 × 82.6
princeton-nlp/CoFi-RTE-s60 RTE 60.2% 2.0 x 72.6
princeton-nlp/CoFi-RTE-s96 RTE 96.2% 12.8 x 66.1
princeton-nlp/CoFi-CoLA-s60 CoLA 60.4% 2.0 x 60.4
princeton-nlp/CoFi-CoLA-s95 CoLA 95.1% 12.3 x 38.9
princeton-nlp/CoFi-MRPC-s60 MRPC 61.5% 2.0 x 86.8
princeton-nlp/CoFi-MRPC-s95 MRPC 94.9% 12.2 x 83.6

You can use these models with the huggingface interface:

from CoFiPruning.models import CoFiBertForSequenceClassification
model = CoFiBertForSequenceClassification.from_pretrained("princeton-nlp/CoFi-MNLI-s95") 
output = model(**inputs)

Train CoFiPruning

In the following section, we provide instructions on training CoFi with our code.

Requirements

Try runing the following script to install the dependencies.

pip install -r requirements.txt

Training

Training scripts

We provide example training scripts for training with CoFiPruning with different combination of training units and objectives in scripts/run_CoFi.sh. The script only supports single-GPU training and we explain the arguments in following:

  • --task_name: we support sequence classification tasks and extractive question answer tasks. You can input a glue task name, e.g., MNLI or use --train_file and --validation_file arguments with other tasks (supported by HuggingFace).
  • --ex_name_suffix: experiment name (for output dir)
  • --ex_cate: experiment category name (for output dir)
  • --pruning_type: we support all combinations of the following four types of pruning units. Default pruning type is structured_heads+structured_mlp+hidden+layer. Setting it to None falls back to standard fine-tuning.
    • structured_heads: head pruning
    • structured_mlp: mlp intermediate dimension pruning
    • hidden: hidden states pruning
    • layer: layer pruning
  • --target_sparsity: target sparsity of the pruned model
  • --distillation_path: the directory of the teacher model
  • --distillation_layer_loss_alpha: weight for layer distillation
  • --distillation_ce_loss_alpha: weight for cross entropy distillation
  • --layer_distill_version: we recommend using version 4 for small-sized datasets to impose an explicit restriction on layer orders but for relatively larger datasets, version 3 and version 4 do not make much difference.

After pruning the model, the same script could be used for further fine-tuning the pruned model with following arguments:

  • --pretrained_pruned_model: directory of the pruned model
  • --learning_rate: learning rate of the fine-tuning stage Note that during fine-tuning stage, pruning_type should be set to None.

An example for training (pruning) is as follows:

TASK=MNLI
SUFFIX=sparsity0.95
EX_CATE=CoFi
PRUNING_TYPE=structured_head+structured_mlp+hidden+layer
SPARSITY=0.95
DISTILL_LAYER_LOSS_ALPHA=0.9
DISTILL_CE_LOSS_ALPHA=0.1
LAYER_DISTILL_VERSION=4

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION

An example for fine_tuning after pruning is as follows:

PRUNED_MODEL_PATH=$proj_dir/$TASK/$EX_CATE/${TASK}_${SUFFIX}/best
PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning.
LEARNING_RATE=3e-5

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION [PRUNED_MODEL_PATH] $LEARNING_RATE

The training process will save the model with the best validation accuracy under $PRUNED_MODEL_PATH/best. And you can use the evaluation.py script for evaluation.

Evaluation

Our pruned models are served on Huggingface's model hub. You can use the script evalution.py to get the sparsity, inference time and development set results of a pruned model.

python evaluation.py [TASK] [MODEL_NAME_OR_DIR]

An example use of evaluating a sentence classification model is as follows:

python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 

The expected output of the model is as follows:

Task: MNLI
Model path: princeton-nlp/CoFi-MNLI-s95
Model size: 4920106
Sparsity: 0.943
mnli/acc: 0.8055
seconds/example: 0.010151

Hyperparameters

We use the following hyperparamters for training CoFiPruning:

GLUE (small) GLUE (large) SQuAD
Batch size 32 32 16
Pruning learning rate 2e-5 2e-5 3e-5
Fine-tuning learning rate 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5
Layer distill. alpha 0.9, 0.7, 0.5 0.9, 0.7, 0.5 0.9, 0.7, 0.5
Cross entropy distill. alpha 0.1, 0.3, 0.5 0.1, 0.3, 0.5 0.1, 0.3, 0.5
Pruning epochs 100 20 20
Pre-finetuning epochs 4 1 1
Sparsity warmup epochs 20 2 2
Finetuning epochs 20 20 20

GLUE (small) denotes the GLUE tasks with a relatively smaller size including CoLA, STS-B, MRPC and RTE and GLUE (large) denotes the rest of the GLUE tasks including SST-2, MNLI, QQP and QNLI. Note that hyperparameter search is essential for small-sized datasets but is less important for large-sized datasets.

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Mengzhou ([email protected]) and Zexuan ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

Please cite our paper if you use CoFiPruning in your work:

@inproceedings{xia2022structured,
   title={Structured Pruning Learns Compact and Accurate Models},
   author={Xia, Mengzhou and Zhong, Zexuan and Chen, Danqi},
   booktitle={Association for Computational Linguistics (ACL)},
   year={2022}
}
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

A brief explanation This script provides a quick way to setup a Time-of-day (Tod

2 Feb 03, 2022
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
Dust model dichotomous performance analysis

Dust-model-dichotomous-performance-analysis Using a collated dataset of 90,000 dust point source observations from 9 drylands studies from around the

1 Dec 17, 2021
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
A BERT-based reverse-dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end Quick Start C

Eu-Bin KIM 94 Dec 08, 2022
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

3 Dec 20, 2022
运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

OlittleRer 运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。编程语言和工具包括Java、Python、Matlab、CPLEX、Gurobi、SCIP 等。 关注我们: 运筹小公众号 有问题可以直接在

运小筹 151 Dec 30, 2022
BERT-based Financial Question Answering System

BERT-based Financial Question Answering System In this example, we use Jina, PyTorch, and Hugging Face transformers to build a production-ready BERT-b

Bithiah Yuan 61 Sep 18, 2022
Transformers implementation for Fall 2021 Clinic

Installation Download miniconda3 if not already installed You can check by running typing conda in command prompt. Use conda to create an environment

Aakash Tripathi 1 Oct 28, 2021
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

Jeong Ukjae 13 Sep 02, 2022
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

SHI Lab 367 Dec 31, 2022
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Jan 08, 2023
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 03, 2023
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 09, 2021
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022