Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Last update: Dec 26, 2022

Related tags

Overview

Expediting Vision Transformers via Token Reorganizations

This repository contains PyTorch evaluation code, training code and pretrained EViT models for the ICLR 2022 Spotlight paper:

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

The proposed EViT models obtain competitive tradeoffs in terms of speed / precision:

If you use this code for a paper please cite:

@inproceedings{liang2022evit,
title={Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations},
author={Youwei Liang and Chongjian Ge and Zhan Tong and Yibing Song and Jue Wang and Pengtao Xie},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=BjyvwnXXVn_}
}

Model Zoo

We provide EViT-DeiT-S models pretrained on ImageNet 2012.

Token fusion	Keep rate	[email protected]	[email protected]	#Params	URL
✓	0.9	79.8	95.0	22.1M	model
✓	0.8	79.8	94.9	22.1M	model
✓	0.7	79.5	94.8	22.1M	model
✓	0.6	78.9	94.5	22.1M	model
✓	0.5	78.5	94.2	22.1M	model
✗	0.9	79.9	94.9	22.1M	model
✗	0.8	79.7	94.8	22.1M	model
✗	0.7	79.4	94.7	22.1M	model
✗	0.6	79.1	94.5	22.1M	model
✗	0.5	78.4	94.1	22.1M	model

Preparation

The reported results in the paper were obtained with models trained with 16 NVIDIA A100 GPUs using Python3.6 and the following packages

torch==1.9.0
torchvision==0.10.0
timm==0.4.12
tensorboardX==2.4
torchprofile==0.0.4
lmdb==1.2.1
pyarrow==5.0.0

These packages can be installed by running pip install -r requirements.txt.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

We use the same datasets as in DeiT. You can optionally use an LMDB dataset for ImageNet by building it using folder2lmdb.py and passing --use-lmdb to main.py, which may speed up data loading.

Usage

First, clone the repository locally:

git clone https://github.com/youweiliang/evit.git

Change directory to the cloned repository by running cd evit, install necessary packages, and prepare the datasets.

Training

To train EViT/0.7-DeiT-S on ImageNet, set the datapath (path to dataset) and logdir (logging directory) in run_code.sh properly and run bash ./run_code.sh (--nproc_per_node should be modified if necessary). Note that the batch size in the paper is 16x128=2048.

Set --base_keep_rate in run_code.sh to use a different keep rate, and set --fuse_token to configure whether to use inattentive token fusion.

Training/Finetuning on higher resolution images

To training on images with a (higher) resolution h, set --input-size h in run_code.sh.

Multinode training

Please refer to DeiT for multinode training.

Finetuning

First set the datapath, logdir, and ckpt (the model checkpoint for finetuning) in run_code.sh, and then run bash ./finetune.sh.

Evaluation

To evaluate a pre-trained EViT/0.7-DeiT-S model on ImageNet val with a single GPU run (replacing checkpoint with the actual file):

python3 main.py --model deit_small_patch16_shrink_base --fuse_token --base_keep_rate 0.7 --eval --resume checkpoint --data-path /path/to/imagenet

You can also pass --dist-eval to use multiple GPUs for evaluation.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

We would like to think the authors of DeiT, based on which this project is built.

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Related tags

Overview

Expediting Vision Transformers via Token Reorganizations

Model Zoo

Preparation

Data preparation

Usage

Training

Training/Finetuning on higher resolution images

Multinode training

Finetuning

Evaluation

License

Acknowledgement

Owner

Youwei Liang

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

OpenChat: Opensource chatting framework for generative models

Fastseq 基于ONNXRUNTIME的文本生成加速框架

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

The source code of HeCo

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

CDLA: A Chinese document layout analysis (CDLA) dataset

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

PyWorld3 is a Python implementation of the World3 model

Transformation spoken text to written text

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Tools for curating biomedical training data for large-scale language modeling

DeepPavlov Tutorials

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET