An official implementation of the Anchor DETR.

Last update: Dec 28, 2022

Related tags

Overview

Anchor DETR: Query Design for Transformer-Based Detector

Introduction

This repository is an official implementation of the Anchor DETR. We encode the anchor points as the object queries in DETR. Multiple patterns are attached to each anchor point to solve the difficulty: "one region, multiple objects". We also propose an attention variant RCDA to reduce the memory cost for high-resolution features.

Main Results

	feature	epochs	AP	GFLOPs	Infer Speed (FPS)
DETR	DC5	500	43.3	187	10 (12)
SMCA	multi-level	50	43.7	152	10
Deformable DETR	multi-level	50	43.8	173	15
Conditional DETR	DC5	50	43.8	195	10
Anchor DETR	DC5	50	44.3	151	16 (19)

Note:

The results are based on ResNet-50 backbone.
Inference speeds are measured on NVIDIA Tesla V100 GPU.
The values in parentheses of the Infer Speed indicate the speed with torchscript optimization.

Model

name	backbone	AP	URL
AnchorDETR-C5	R50	42.1	model / log
AnchorDETR-DC5	R50	44.3	model / log
AnchorDETR-C5	R101	43.5	model / log
AnchorDETR-DC5	R101	45.1	model / log

Note: the models and logs are also available at Baidu Netdisk with code hh13.

Usage

Installation

First, clone the repository locally:

git clone https://github.com/megvii-research/AnchorDETR.git

Then, install dependencies:

pip install -r requirements.txt

Training

To train AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py  --coco_path /path/to/coco

Evaluation

To evaluate AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth

To evaluate AnchorDETR with a single GPU:

python main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{wang2021anchor,
      title={Anchor DETR: Query Design for Transformer-Based Detector},
      author={Yingming Wang and Xiangyu Zhang and Tong Yang and Jian Sun},
      year={2021},
      eprint={2109.07107},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

An official implementation of the Anchor DETR.

Related tags

Overview

Anchor DETR: Query Design for Transformer-Based Detector

Introduction

Main Results

Model

Usage

Installation

Training

Evaluation

Citation

Contact

Owner

MEGVII Research

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

A Self-Supervised Contrastive Learning Framework for Aspect Detection

The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Object detection on multiple datasets with an automatically learned unified label space.

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

A system for quickly generating training data with weak supervision

Demo code for paper "Learning optical flow from still images", CVPR 2021.

Pytorch implementation of the paper: "A Unified Framework for Separating Superimposed Images", in CVPR 2020.

In-place Parallel Super Scalar Samplesort (IPS⁴o)

Leveraging OpenAI's Codex to solve cornerstone problems in Music

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

A novel benchmark dataset for Monocular Layout prediction

gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

This is the repository for the paper "Have I done enough planning or should I plan more?"

Hardware accelerated, batchable and differentiable optimizers in JAX.

Examples of how to create colorful, annotated equations in Latex using Tikz.

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.