This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Last update: Nov 15, 2022

Related tags

Overview

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

It includes /bert, which is the original BERT repository modified to be weight prunable. (And to use gradient checkpointing, if you need that. This can be disabled by setting a unix environment variable DISABLE_GRAD_CHECKPOINT=True. This only works during fine-tuning, not during pre-training.)

I am currently in the process of converting these experiments into a ducttape workflow, so things are a little unstable right now.

Things that have not been converted to ducttape:

Anything in tables/
Anything in graphs/

If you need all the experiments from the paper, check out this commit. It's very messy, so be prepared to read the code. I will not be releasing a guide to run that code, since it will be made obselete by the ducttape workflow.

Configuration

pip install -r requirements.txt

To pre-train, you will need a GPU with at least 12 GB of GPU RAM. I've been using Titan RTX's via Univa Grid Engine. If you don't like this setup, you will need to modify tapes/submitters.tape and/or main.tconf.

You'll also need the Wikipedia corpus and BookCorpus, which can be retrieved with scripts/download_wiki.sh or scripts/download_bookcorpus.sh, respectively. GLUE data can be retrieved by running scripts/get_glue.py.

You will need to update tapes/link_data.tape to point to dataset locations.

You will also need to update main.tconf to point to the location of your repository on disk (so ducttape knows where to find packages).

AFAIK, no one besides me has used this code. If you have trouble, please open an issue and I'll do what I can to help out.

Most experiments are run using

ducttape main.tape -C main.tconf -p main

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Related tags

Overview

Configuration

Owner

Mitchell Gordon

Alternatives to Deep Neural Networks for Function Approximations in Finance

Python SDK for building, training, and deploying ML models

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Code for paper Novel View Synthesis via Depth-guided Skip Connections

Miscellaneous and lightweight network tools

Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

Denoising Normalizing Flow

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Pytorch implementation of Nueral Style transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Automated Attendance Project Using Face Recognition

Code for the paper: Fighting Fake News: Image Splice Detection via Learned Self-Consistency

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

Pytorch implementation of the paper Time-series Generative Adversarial Networks

Repo público onde postarei meus estudos de Python, buscando aprender por meio do compartilhamento do aprendizado!