Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

Last update: Dec 12, 2022

Overview

DE-DETRs

By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao

This repository is an official implementation of DE-DETR and DELA-DETR in the paper Towards Data-Efficient Detection Transformers.

For the implementation of DE-CondDETR and DELA-CondDETR, please refer to DE-CondDETR.

Introduction

TL; DR. We identify the data-hungry issue of existing detection transformers and alleviate it by simply alternating how key and value sequences are constructed in the cross-attention layer, with minimum modifications to the original models. Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.

Abstract. Detection Transformers have achieved competitive performance on the sample-rich COCO dataset. However, we show most of them suffer from significant performance drops on small-size datasets, like Cityscapes. In other words, the detection transformers are generally data-hungry. To tackle this problem, we empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. The empirical results suggest that sparse feature sampling from local image areas holds the key. Based on this observation, we alleviate the data-hungry issue of existing detection transformers by simply alternating how key and value sequences are constructed in the cross-attention layer, with minimum modifications to the original models. Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency. Experiments show that our method can be readily applied to different detection transformers and improve their performance on both small-size and sample-rich datasets.

Main Results

The experimental results and model weights trained on Cityscapes are shown below.

Model	Epochs	mAP	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	Log & Model
DETR	300	11.7	26.5	9.3	2.6	9.2	25.6	Google Drive
DE-DETR	50	22.2	41.7	20.5	4.9	19.7	40.8	Google Drive
DELA-DETR	50	25.2	46.8	22.8	6.5	23.8	44.3	Google Drive

The experimental results and model weights trained on COCO 2017 are shown below.

Model	Epochs	mAP	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	Log & Model
DETR	50	33.6	54.6	34.2	13.2	35.7	53.5	Google Drive
DE-DETR	50	40.2	60.4	43.2	23.3	42.1	56.4	Google Drive
DELA-DETR	50	41.9	62.6	44.8	24.9	44.9	56.8	Google Drive

Note:

The number of queries is increased from 100 to 300 in DELA-DETR.
The performance of the model weights on Cityscapes is slightly different from that reported in the paper, because the results in the paper are the average of five repeated runs with different random seeds.

Installation

Requirements

Linux, CUDA>=9.2, GCC>=5.4
Python>=3.7
PyTorch>=1.5.0, torchvision>=0.6.0 (following instructions here)
Detectron2>=0.5 for RoIAlign (following instructions here)
Other requirements
```
pip install -r requirements.txt
```

Usage

Dataset preparation

The COCO 2017 dataset can be downloaded from here and the Cityscapes datasets can be downloaded from here. The annotations in COCO format can be obtained from here. Afterward, please organize the datasets and annotations as following:

data
└─ cityscapes
   └─ leftImg8bit
      |─ train
      └─ val
└─ coco
   |─ annotations
   |─ train2017
   └─ val2017
└─ CocoFormatAnnos
   |─ cityscapes_train_cocostyle.json
   |─ cityscapes_val_cocostyle.json
   |─ instances_train2017_sample11828.json
   |─ instances_train2017_sample5914.json
   |─ instances_train2017_sample2365.json
   └─ instances_train2017_sample1182.json

The annotations for down-sampled COCO 2017 dataset is generated using utils/downsample_coco.py

Training

Training DELA-DETR on Cityscapes

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cityscapes --coco_path data/cityscapes --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DELA-DETR on down-sampled COCO 2017, with e.g. sample_rate=0.01

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cocodown --coco_path data/coco --sample_rate 0.01 --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DELA-DETR on COCO 2017

python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 --use_env main.py --dataset_file coco --coco_path data/coco --batch_size 4 --model dela-detr --repeat_label 2 --nms --num_queries 300 --wandb

Training DE-DETR on Cityscapes

python -m torch.distributed.launch --nproc_per_node=2 --master_port=29501 --use_env main.py --dataset_file cityscapes --coco_path data/cityscapes --batch_size 4 --model de-detr --wandb

Training DETR baseline

Please refer to the detr branch.

Evaluation

You can get the pretrained model (the link is in "Main Results" session), then run following command to evaluate it on the validation set:

<training command> --resume <path to pre-trained model> --eval

Acknowledgement

This project is based on DETR and Deformable DETR. Thanks for their wonderful works. See LICENSE for more details.

Citing DE-DETRs

If you find DE-DETRs useful in your research, please consider citing:

@misc{wang2022towards,
      title={Towards Data-Efficient Detection Transformers}, 
      author={Wen Wang and Jing Zhang and Yang Cao and Yongliang Shen and Dacheng Tao},
      year={2022},
      eprint={2203.09507},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

Related tags

Overview

DE-DETRs

Introduction

Main Results

Installation

Requirements

Usage

Dataset preparation

Training

Training DELA-DETR on Cityscapes

Training DELA-DETR on down-sampled COCO 2017, with e.g. sample_rate=0.01

Training DELA-DETR on COCO 2017

Training DE-DETR on Cityscapes

Training DETR baseline

Evaluation

Acknowledgement

Citing DE-DETRs

Owner

Wen Wang

Decensoring Hentai with Deep Neural Networks. Formerly named DeepMindBreak.

CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Time Series Cross-Validation -- an extension for scikit-learn

RL agent to play μRTS with Stable-Baselines3

Wandb-predictions - WANDB Predictions With Python

sktime companion package for deep learning based on TensorFlow

Learning Open-World Object Proposals without Learning to Classify

An implementation of Deep Forest 2021.2.1.

PyTorch implementation of the YOLO (You Only Look Once) v2

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Where2Act: From Pixels to Actions for Articulated 3D Objects

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021.

A simple program for training and testing vit

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.