TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Last update: Jan 09, 2023

Overview

TOOD: Task-aligned One-stage Object Detection (ICCV 2021 Oral)

Introduction

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization.

Method overview

Parallel head vs. T-head

Prerequisites

MMDetection version 2.14.0.
Please see get_started.md for installation and the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'.

./tools/dist_train.sh configs/tood/tood_r50_fpn_1x_coco.py 4

Inference

./tools/dist_test.sh configs/tood/tood_r50_fpn_1x_coco.py work_dirs/tood_r50_fpn_1x_coco/epoch_12.pth 4 --eval bbox

Models

For your convenience, we provide the following trained models (TOOD). All models are trained with 16 images in a mini-batch.

Model	Anchor	MS train	DCN	Lr schd	AP (minival)	AP (test-dev)	Config	Download
TOOD_R_50_FPN_1x	Anchor-free	No	N	1x	42.5	42.7	config	google / baidu
TOOD_R_50_FPN_anchor_based_1x	Anchor-based	No	N	1x	42.4	42.8	config	google / baidu
TOOD_R_101_FPN_2x	Anchor-free	Yes	N	2x	46.2	46.7	config	google / baidu
TOOD_X_101_FPN_2x	Anchor-free	Yes	N	2x	47.6	48.5	config	google / baidu
TOOD_R_101_dcnv2_FPN_2x	Anchor-free	Yes	Y	2x	49.2	49.6	config	google / baidu
TOOD_X_101_dcnv2_FPN_2x	Anchor-free	Yes	Y	2x	50.5	51.1	config	google / baidu

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] dcnv2 denotes deformable convolutional networks v2.
[2] Refer to more details in config files in config/tood/.
[3] Extraction code of baidu netdisk: tood.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find TOOD useful in your research, please consider citing:

@inproceedings{feng2021tood,
    title={TOOD: Task-aligned One-stage Object Detection},
    author={Feng, Chengjian and Zhong, Yujie and Gao, Yu and Scott, Matthew R and Huang, Weilin},
    booktitle={ICCV},
    year={2021}
}

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Related tags

Overview

TOOD: Task-aligned One-stage Object Detection (ICCV 2021 Oral)

Introduction

Method overview

Parallel head vs. T-head

Prerequisites

Train

Inference

Models

Acknowledgement

Citation

Owner

Object Detection with YOLOv3

GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications

SciFive: a text-text transformer model for biomedical literature

GAN Image Generator and Characterwise Image Recognizer with python

Ros2-voiceroid2 - ROS2 wrapper package of VOICEROID2

A high-level Python library for Quantum Natural Language Processing

Time Delayed NN implemented in pytorch

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

YouRefIt: Embodied Reference Understanding with Language and Gesture

This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

NLU Dataset Diagnostics

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Predictive Modeling on Electronic Health Records(EHR) using Pytorch

SmoothGrad implementation in PyTorch

Implementation for Curriculum DeepSDF

Python scripts for performing stereo depth estimation using the HITNET Tensorflow model.

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022