Explore extreme compression for pre-trained language models

Last update: Nov 14, 2022

Related tags

Deep Learning Xcompression

Overview

Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

Before Training

install some libraries

 pip install tensorly==0.5.0

Torch is needed, torch 1.0-1.4 is preferred

Install horovod for distributed learning

Configuration Install horovod on GPU

pip install horovod[pytorch]

loading pre-trained models

wget https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin -P  models/bert-base-uncased
wget https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt -P  models/bert-base-uncased
cp models/bert-base-uncased/pytorch_model.bin models/bert-td-72-384/pytorch_model.bin 
cp models/bert-base-uncased/vocab.txt models/bert-td-72-384/vocab.txt

generate training data for given corpora (e.g., saved in the path "corpora" )

python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \ 
                  --bert_model ${BERT_BASE_DIR}$ \
                  --reduce_memory --do_lower_case \
                  --epochs_to_generate 3 \
                  --output_dir ${CORPUS_JSON_DIR}$

task data augmentation

python data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR}$ \
                            --glove_embs ${GLOVE_EMB}$ \
                            --glue_dir ${GLUE_DIR}$ \  
                            --task_name ${TASK_NAME}$

Decomposing BERT

decomposition and general distillation

Run with horovod

mpirun -np 8 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python3 general_distill.py --teacher_model models/bert-base-uncased --student_model models/bert-gd-72-384 --pregenerated_data data/pregenerated_data --num_train_epochs 2.0 --train_batch_size 32 --output_dir output/bert-gd-72-384 -use_swap --do_lower_case

To restrict sharing among SAN or FFN, add "ops" and set "ops" to be "san" or "ffn" in bert-gd-72-384/config.json

ops = "san"

Evaluation

Task distillation with data augmentation in fine-tuning phase

Rename a pretrained model as "", for instance, change step_0_pytorch_model.bin to pytorch_model.bin, and change load_compressed_model from false to true in output/config.json

Task distillation for distributed training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 task_distill.py --teacher_model models/bert-base-uncasedi/STS-B --student_model models/bert-gd-72-384 --task_name STS-B --aug_train --data_dir data/glue_data/SST-2 --max_seq_length 128 --train_batch_size 32 --aug_train --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir ./output/36-256-STS-B

Task distillation for single gpu

python3  task_distill.py  --teacher_model models/bert-base-uncased   --student_model  models/bert-td-72-384  --output output_demo  --data_dir  data/glue_data/SST-2   --task_name  SST-2  --do_lower_case --aug_train

For augmentation, you should add --aug_train

Get test result for model

python run_glue.py --model_name_or_path  models/bert-td-72-384/SST-2 --task_name SST-2 --do_eval --do_predict --data_dir data/glue_data/STS-B --max_seq_length 128 --save_steps 500 --save_total_limit 2 --output_dir ./output/SST-2

Explore extreme compression for pre-trained language models

Related tags

Overview

Explore extreme compression for pre-trained language models

Before Training

install some libraries

loading pre-trained models

generate training data for given corpora (e.g., saved in the path "corpora" )

task data augmentation

Decomposing BERT

decomposition and general distillation

Evaluation

Task distillation with data augmentation in fine-tuning phase

Owner

twinkle

Pure python PEMDAS expression solver without using built-in eval function

Neural Radiance Fields Using PyTorch

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

Research code of ICCV 2021 paper "Mesh Graphormer"

Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)

Code for the submitted paper Surrogate-based cross-correlation for particle image velocimetry

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

A multi-mode modulator for multi-domain few-shot classification (ICCV)

Tree Nested PyTorch Tensor Lib

Segment axon and myelin from microscopy data using deep learning

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Multi-Scale Progressive Fusion Network for Single Image Deraining