ACL'22: Structured Pruning Learns Compact and Accurate Models

Overview

CoFiPruning: Structured Pruning Learns Compact and Accurate Models

This repository contains the code and pruned models for our ACL'22 paper Structured Pruning Learns Compact and Accurate Models.

**************************** Updates ****************************

  • 05/09/2022: We release the pruned model checkpoints on RTE, MRPC and CoLA!
  • 04/01/2022: We released our paper along with pruned model checkpoints on SQuAD, SST-2, QNLI and MNLI. Check it out!

Quick Links

Overview

We propose CoFiPruning, a task-specific, structured pruning approach (Coarse and Fine-grained Pruning) and show that structured pruning can achieve highly compact subnetworks and obtain large speedups and competitive accuracy as distillation approaches, while requiring much less computation. Our key insight is to jointly prune coarse-grained units (e.g., self-attention or feed-forward layers) and fine-grained units (e.g., heads, hidden dimensions) simultaneously. Different from existing works, our approach controls the pruning decision of every single parameter by multiple masks of different granularity. This is the key to large compression, as it allows the greatest flexibility of pruned structures and eases the optimization compared to only pruning small units. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization.

Main Results

We show the main results of CoFiPruning along with results of popular pruning and distillation methods including Block Pruning, DynaBERT, DistilBERT and TinyBERT. Please see more detailed results in our paper.

Model List

Our released models are listed as following. You can download these models with the following links. We use a batch size of 128 and V100 32GB GPUs for speedup evaluation. We show F1 score for SQuAD and accuracy score for GLUE datasets. s60 denotes that the sparsity of the model is roughly 60%.

model name task sparsity speedup score
princeton-nlp/CoFi-MNLI-s60 MNLI 60.2% 2.1 × 85.3
princeton-nlp/CoFi-MNLI-s95 MNLI 94.3% 12.1 × 80.6
princeton-nlp/CoFi-QNLI-s60 QNLI 60.3% 2.1 × 91.8
princeton-nlp/CoFi-QNLI-s95 QNLI 94.5% 12.1 × 86.1
princeton-nlp/CoFi-SST2-s60 SST-2 60.1% 2.1 × 93.0
princeton-nlp/CoFi-SST2-s95 SST-2 94.5% 12.2 × 90.4
princeton-nlp/CoFi-SQuAD-s60 SQuAD 59.8% 2.0 × 89.1
princeton-nlp/CoFi-SQuAD-s93 SQuAD 92.4% 8.7 × 82.6
princeton-nlp/CoFi-RTE-s60 RTE 60.2% 2.0 x 72.6
princeton-nlp/CoFi-RTE-s96 RTE 96.2% 12.8 x 66.1
princeton-nlp/CoFi-CoLA-s60 CoLA 60.4% 2.0 x 60.4
princeton-nlp/CoFi-CoLA-s95 CoLA 95.1% 12.3 x 38.9
princeton-nlp/CoFi-MRPC-s60 MRPC 61.5% 2.0 x 86.8
princeton-nlp/CoFi-MRPC-s95 MRPC 94.9% 12.2 x 83.6

You can use these models with the huggingface interface:

from CoFiPruning.models import CoFiBertForSequenceClassification
model = CoFiBertForSequenceClassification.from_pretrained("princeton-nlp/CoFi-MNLI-s95") 
output = model(**inputs)

Train CoFiPruning

In the following section, we provide instructions on training CoFi with our code.

Requirements

Try runing the following script to install the dependencies.

pip install -r requirements.txt

Training

Training scripts

We provide example training scripts for training with CoFiPruning with different combination of training units and objectives in scripts/run_CoFi.sh. The script only supports single-GPU training and we explain the arguments in following:

  • --task_name: we support sequence classification tasks and extractive question answer tasks. You can input a glue task name, e.g., MNLI or use --train_file and --validation_file arguments with other tasks (supported by HuggingFace).
  • --ex_name_suffix: experiment name (for output dir)
  • --ex_cate: experiment category name (for output dir)
  • --pruning_type: we support all combinations of the following four types of pruning units. Default pruning type is structured_heads+structured_mlp+hidden+layer. Setting it to None falls back to standard fine-tuning.
    • structured_heads: head pruning
    • structured_mlp: mlp intermediate dimension pruning
    • hidden: hidden states pruning
    • layer: layer pruning
  • --target_sparsity: target sparsity of the pruned model
  • --distillation_path: the directory of the teacher model
  • --distillation_layer_loss_alpha: weight for layer distillation
  • --distillation_ce_loss_alpha: weight for cross entropy distillation
  • --layer_distill_version: we recommend using version 4 for small-sized datasets to impose an explicit restriction on layer orders but for relatively larger datasets, version 3 and version 4 do not make much difference.

After pruning the model, the same script could be used for further fine-tuning the pruned model with following arguments:

  • --pretrained_pruned_model: directory of the pruned model
  • --learning_rate: learning rate of the fine-tuning stage Note that during fine-tuning stage, pruning_type should be set to None.

An example for training (pruning) is as follows:

TASK=MNLI
SUFFIX=sparsity0.95
EX_CATE=CoFi
PRUNING_TYPE=structured_head+structured_mlp+hidden+layer
SPARSITY=0.95
DISTILL_LAYER_LOSS_ALPHA=0.9
DISTILL_CE_LOSS_ALPHA=0.1
LAYER_DISTILL_VERSION=4

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION

An example for fine_tuning after pruning is as follows:

PRUNED_MODEL_PATH=$proj_dir/$TASK/$EX_CATE/${TASK}_${SUFFIX}/best
PRUNING_TYPE=None # Setting the pruning type to be None for standard fine-tuning.
LEARNING_RATE=3e-5

bash scripts/run_CoFi.sh $TASK $SUFFIX $EX_CATE $PRUNING_TYPE $SPARSITY [DISTILLATION_PATH] $DISTILL_LAYER_LOSS_ALPHA $DISTILL_CE_LOSS_ALPHA $LAYER_DISTILL_VERSION [PRUNED_MODEL_PATH] $LEARNING_RATE

The training process will save the model with the best validation accuracy under $PRUNED_MODEL_PATH/best. And you can use the evaluation.py script for evaluation.

Evaluation

Our pruned models are served on Huggingface's model hub. You can use the script evalution.py to get the sparsity, inference time and development set results of a pruned model.

python evaluation.py [TASK] [MODEL_NAME_OR_DIR]

An example use of evaluating a sentence classification model is as follows:

python evaluation.py MNLI princeton-nlp/CoFi-MNLI-s95 

The expected output of the model is as follows:

Task: MNLI
Model path: princeton-nlp/CoFi-MNLI-s95
Model size: 4920106
Sparsity: 0.943
mnli/acc: 0.8055
seconds/example: 0.010151

Hyperparameters

We use the following hyperparamters for training CoFiPruning:

GLUE (small) GLUE (large) SQuAD
Batch size 32 32 16
Pruning learning rate 2e-5 2e-5 3e-5
Fine-tuning learning rate 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5 1e-5, 2e-5, 3e-5
Layer distill. alpha 0.9, 0.7, 0.5 0.9, 0.7, 0.5 0.9, 0.7, 0.5
Cross entropy distill. alpha 0.1, 0.3, 0.5 0.1, 0.3, 0.5 0.1, 0.3, 0.5
Pruning epochs 100 20 20
Pre-finetuning epochs 4 1 1
Sparsity warmup epochs 20 2 2
Finetuning epochs 20 20 20

GLUE (small) denotes the GLUE tasks with a relatively smaller size including CoLA, STS-B, MRPC and RTE and GLUE (large) denotes the rest of the GLUE tasks including SST-2, MNLI, QQP and QNLI. Note that hyperparameter search is essential for small-sized datasets but is less important for large-sized datasets.

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Mengzhou ([email protected]) and Zexuan ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

Please cite our paper if you use CoFiPruning in your work:

@inproceedings{xia2022structured,
   title={Structured Pruning Learns Compact and Accurate Models},
   author={Xia, Mengzhou and Zhong, Zexuan and Chen, Danqi},
   booktitle={Association for Computational Linguistics (ACL)},
   year={2022}
}
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

뉴스 도메인 질의응답 시스템 본 프로젝트는 뉴스기사에 대한 질의응답 서비스 를 제공하기 위해서 진행한 프로젝트입니다. 약 3개월간 ( 21. 03 ~ 21. 05 ) 진행하였으며 Transformer 아키텍쳐 기반의 Encoder를 사용하여 한국어 질의응답 데이터셋으로

TaegyeongEo 4 Jul 08, 2022
Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Visualize, analyze, and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BER

Jay Alammar 1.6k Dec 25, 2022
结巴中文分词

jieba “结巴”中文分词:做最好的 Python 中文分词组件 "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation

Sun Junyi 29.8k Jan 02, 2023
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1.1k Dec 27, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

Nathan Cooper 2.3k Jan 01, 2023
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 41 Nov 18, 2022
Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

Meta Research 578 Dec 13, 2022
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
Anuvada: Interpretable Models for NLP using PyTorch

Anuvada: Interpretable Models for NLP using PyTorch So, you want to know why your classifier arrived at a particular decision or why your flashy new d

EDGE 102 Oct 01, 2022
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could ha

p.katekomol 1 Jan 24, 2022
Poetry PEP 517 Build Backend & Core Utilities

Poetry Core A PEP 517 build backend implementation developed for Poetry. This project is intended to be a light weight, fully compliant, self-containe

Poetry 293 Jan 02, 2023
Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Hans Alemão 4 Jul 20, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 07, 2022
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021