This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Last update: Dec 18, 2022

Overview

Dynamic-Vision-Transformer (Pytorch)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length

Update on 2021/06/01: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically.

Results

Top-1 accuracy on ImageNet v.s. GFLOPs

Top-1 accuracy on CIFAR v.s. GFLOPs

Top-1 accuracy on ImageNet v.s. Throughput

Visualization

Pre-trained Models

Backbone	# of Exits	# of Tokens	Links
T2T-ViT-12	3	7x7-10x10-14x14	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_state_dict: state dictionaries of the model
├── flops: a list containing the GFLOPs corresponding to exiting at each exit
├── anytime_classification: Top-1 accuracy of each exit
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of T2T-ViT from here.

To Do

Update the code for training.

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Related tags

Overview

Dynamic-Vision-Transformer (Pytorch)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Contact

Acknowledgment

To Do

Owner

Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

Fast convergence of detr with spatially modulated co-attention

🇰🇷 Text to Image in Korean

Codebase for Image Classification Research, written in PyTorch.

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

This is the official github repository of the Met dataset

Implementation of Feedback Transformer in Pytorch

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

Data reduction pipeline for KOALA on the AAT.

SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

Implementation of the Swin Transformer in PyTorch.

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Mapping Conditional Distributions for Domain Adaptation Under Generalized Target Shift