DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

Code for “ACE-HGNN: Adaptive Curvature ExplorationHyperbolic Graph Neural Network”

PyTorch reimplementation of REALM and ORQA

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Reinfore learning tool box, contains trpo, a3c algorithm for continous action space

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

Prior-Guided Multi-View 3D Head Reconstruction

Code for Towards Streaming Perception (ECCV 2020) :car:

Image processing in Python

Multiple paper open-source codes of the Microsoft Research Asia DKI group

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Automatic deep learning for image classification.

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

Codebase for testing whether hidden states of neural networks encode discrete structures.

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Yolov5-lite - Minimal PyTorch implementation of YOLOv5

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".