Vision Transformer with Deformable Attention

This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv].

Introduction

Deformable attention is proposed to model the relations among tokens effectively under the guidance of the important regions in the feature maps. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer (DAT), a general backbone model with deformable attention for both image classification and other dense prediction tasks.

Dependencies

NVIDIA GPU + CUDA 11.1
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
timm
einops
yacs
termcolor

TODO

Classification pretrained models.
Object Detection codebase & models.
Semantic Segmentation codebase & models.
CUDA operators to accelerate sampling operations.

Acknowledgement

This code is developed on the top of Swin Transformer, we thank to their efficient and neat codebase.

Citation

If you find our work is useful in your research, please consider citing:

@misc{xia2022vision,
      title={Vision Transformer with Deformable Attention}, 
      author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang},
      year={2022},
      eprint={2201.00520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

[email protected]

Repository of Vision Transformer with Deformable Attention

Related tags

Overview

Vision Transformer with Deformable Attention

Introduction

Dependencies

TODO

Acknowledgement

Citation

Contact

Owner

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Facial expression detector

Learning-Augmented Dynamic Power Management

Deep Learning Visuals contains 215 unique images divided in 23 categories

Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

CondenseNet: Light weighted CNN for mobile devices

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

NER for Indian languages

OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

An Open Source Machine Learning Framework for Everyone

Magic tool for managing internet connection in local network by @zalexdev

A list of all named GANs!

ONNX Command-Line Toolbox

git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

An executor that performs image segmentation on fashion items