DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Overview

DSEE

Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallahp

License: MIT

Overview

TBD

Requirements

We use conda to create virtual environments.

conda create -f environment.yml
conda activate dsee

Command

Unstructured DSEE

Step 0.

cd non-GPT-2
pip install -e .
cd ..

Step 1. Pre-training

Take SST-2 as example:

OUTPUT_DIR='./sst2_rank16_s1_64'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12345 non-GPT-2/examples/pytorch/text-classification/run_glue.py \
    --save_total_limit 10 \
    --model_name_or_path bert-base-uncased \ 
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64  \
    --learning_rate 2e-4 \
    --evaluation_strategy steps 

Step 2. Pruning & Fine-tuning

OUTPUT_DIR='./sst2_rank16_s1_64_prune_0.5'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12335 \
    non-GPT-2/examples/pytorch/text-classification/run_glue_prune_tune.py \
    --save_total_limit 10 \
    --model_name_or_path sst2_rank16_s1_64 \
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64 \
    --learning_rate 2e-4 \
    --pruning_ratio 0.5 \
    --evaluation_strategy steps

TODO

  • Codes for Unstructured DSEE on GPT-2
  • Codes for Structured DSEE

Acknowledgement

  1. The Huggingface's Transformers (https://github.com/huggingface/transformers)
Owner
VITA
Visual Informatics Group @ University of Texas at Austin
VITA
Image Fusion Transformer

Image-Fusion-Transformer Platform Python 3.7 Pytorch =1.0 Training Dataset MS-COCO 2014 (T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ram

Vibashan VS 68 Dec 23, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Dec 29, 2022
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

DV Lab 156 Dec 21, 2022
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

crispengari 3 Dec 05, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

46 Nov 09, 2022
Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

SSWS-loss_function_based_on_MS-TCN Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation Supervised Sliding Window

3 Aug 03, 2022
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Brent Griffin 172 Dec 21, 2022
Baseline for the Spoofing-aware Speaker Verification Challenge 2022

Introduction This repository contains several materials that supplements the Spoofing-Aware Speaker Verification (SASV) Challenge 2022 including: calc

40 Dec 28, 2022
BBScan py3 - BBScan py3 With Python

BBScan_py3 This repository is forked from lijiejie/BBScan 1.5. I migrated the fo

baiyunfei 12 Dec 30, 2022
g2o: A General Framework for Graph Optimization

g2o - General Graph Optimization Linux: Windows: g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has bee

Rainer Kümmerle 2.5k Dec 30, 2022
An AI Assistant More Than a Toolkit

tymon An AI Assistant More Than a Toolkit The reason for creating framework tymon is simple. making AI more like an assistant, helping us to complete

TymonXie 46 Oct 24, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 05, 2023
Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process, a complete algorithm library is esta

Fu Pengyou 50 Jan 07, 2023
Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Modified prey-predator system We aim to study the behaviors of the modified prey–predator model and establish the effects of several parameters that p

Seoyoung Oh 1 Jan 02, 2022