Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Last update: Sep 13, 2022

Related tags

Overview

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019)

To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection.

Here is a guide to the source code.

Reference

If you are willing to use this code or cite the paper, please refer the following:

@inproceedings{lee2019multi,
 author = {Wonhee Lee and Joonil Na and Gunhee Kim},
 title = {Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations},
 booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
 year = {2019}
}

CVPR Poster [PPT][PDF]

Introduction [PPT][PDF]

Multi-task Learning

Multi-task learning (MTL) aims at jointly training multiple relevant tasks with less annotations to improve the performance of each task.

[1] An Overview of Multi-Task Learning in Deep Neural Networks

[2] Mask R-CNN

Self-supervised Learning

Self-supervised learning (SSL) aims at training the model from the annotations generated by itself with no additional human effort.

[3] Learning Representations for Automatic Colorization

[4] Unsupervised learning of visual representations by solving jigsaw puzzles

Annotation Reuse

Reusing labels of one task is not only helpful to create new tasks and their labels but also capable of improving the performance of the main task through pretraining. Our work focuses on recycling bounding box labels for object detection.

[5] Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing

[6] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Our approach

The key to our approach is to propose a set of auxiliary tasks that are relevant but not identical to object detection. They create their own labels by recycling the bounding box labels (e.g. annotations of the main task) in an SSL manner while regarding the bounding box as metadata. Then these auxiliary tasks are jointly trained with the object detection model in an MTL way.

Approach

Overall architecture

It shows how the object detector (i.e. main task model) such as Faster R-CNN makes a prediction for a given proposal box (red) with assistance of three auxiliary tasks at inference. The auxiliary task models (shown in the bottom right) are almost identical to the main task predictor except no box regressor. The refinement of detection prediction (shown in right) is also collectively done by cooperation of the main and auxiliary task models. K is the number of categories.

3 auxiliary tasks

This is an example of how to generate labels of auxiliary tasks via recycling of GT bounding boxes.

The multi-object soft label assigns the area portions occupied by each class’s GT boxes within a window.
The closeness label scores the distances from the center of the GT box to those of other GT boxes.
The foreground label is a binary mask between foreground and background.

Results

We empirically validate that our approach effectively improves detection performance on various architectures and datasets. We test two state-of-the-art region proposal object detectors, including Faster R-CNN and R-FCN, with three CNN backbones of ResNet-101, InceptionResNet-v2, and MobileNet on two benchmark datasets of PASCAL VOC and COCO.

Qualitative results

Qualitative comparison of detection results between baseline (left) and our approach (right) in each set. We divide the errors into five categories (Localization, Classification, Redundancy, Background, False Negative). Our approach often improves the baseline’s detection by correcting several false negatives and false positives such as background, similar object and redundant detection.

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Related tags

Overview

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019)

Reference

CVPR Poster [PPT][PDF]

Introduction [PPT][PDF]

Multi-task Learning

Self-supervised Learning

Annotation Reuse

Our approach

Approach

Overall architecture

3 auxiliary tasks

Results

Qualitative results

Owner

Code samples for my book "Neural Networks and Deep Learning"

Neural Network to colorize grayscale images

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

Deep Learning (with PyTorch)

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Kroomsa: A search engine for the curious

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

Wileless-PDGNet Implementation

Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Scalable Multi-Agent Reinforcement Learning

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

TensorFlow implementation of "Attention is all you need (Transformer)"

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

Differentiable Annealed Importance Sampling (DAIS)

Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning