Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Last update: Nov 21, 2022

Overview

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021)

Introduction

The conventional detectors tend to make imbalanced classification and suffer performance drop, when the distribution of the training data is severely skewed. In this paper, we propose to use the mean classification score to indicate the classification accuracy for each category during training. Based on this indicator, we balance the classification via an Equilibrium Loss (EBL) and a Memory-augmented Feature Sampling (MFS) method. Specifically, EBL increases the intensity of the adjustment of the decision boundary for the weak classes by a designed score-guided loss margin between any two classes. On the other hand, MFS improves the frequency and accuracy of the adjustments of the decision boundary for the weak classes through over-sampling the instance features of those classes. Therefore, EBL and MFS work collaboratively for finding the classification equilibrium in long-tailed detection, and dramatically improve the performance of tail classes while maintaining or even improving the performance of head classes. We conduct experiments on LVIS using Mask R-CNN with various backbones including ResNet-50-FPN and ResNet-101-FPN to show the superiority of the proposed method. It improves the detection performance of tail classes by 15.6 AP, and outperforms the most recent long-tailed object detectors by more than 1 AP.

Method overview

Memory-augmented Feature Sampling (MFS)

Prerequisites

MMDetection version 2.8.0.
Please see get_started.md for installation and the basic usage of MMDetection.

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with LVIS v1.0 dataset in 'data/lvis_v1/'.
# use decoupled training pipeline:

# 1. train the model with Mask R-CNN
./tools/dist_train.sh configs/loce/mask_rcnn_r50_fpn_normed_mask_mstrain_2x_lvis_v1.py 8

# 2. fine-tune the model with LOCE
./tools/dist_train.sh configs/loce/loce_mask_rcnn_r50_fpn_normed_mask_mstrain_2x_lvis_v1.py 8

Inference

./tools/dist_test.sh configs/loce/loce_mask_rcnn_r50_fpn_normed_mask_mstrain_2x_lvis_v1.py work_dirs/loce_mask_rcnn_r50_fpn_normed_mask_mstrain_2x_lvis_v1/epoch_6.pth 8 --eval bbox segm

Models

For your convenience, we provide the following trained models (LOCE). All models are trained with 16 images in a mini-batch.

Model	Dataset	MS train	box AP	mask AP	Pretrained Model	LOCE
LOCE_R_50_FPN_2x	LVIS v0.5	Yes	28.2	28.4	config / model	config / model
LOCE_R_50_FPN_2x	LVIS v1.0	Yes	27.4	26.6	config / model	config / model
LOCE_R_101_FPN_2x	LVIS v1.0	Yes	29.0	28.0	config / model	config / model

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/loce/.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find LOCE useful in your research, please consider citing:

@inproceedings{feng2021exploring,
    title={Exploring Classification Equilibrium in Long-Tailed Object Detection},
    author={Feng, Chengjian and Zhong, Yujie and Huang, Weilin},
    booktitle={ICCV},
    year={2021}
}

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Related tags

Overview

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021)

Introduction

Method overview

Memory-augmented Feature Sampling (MFS)

Prerequisites

Train

Inference

Models

Acknowledgement

Citation

Owner

Bilinear attention networks for visual question answering

Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

A library of multi-agent reinforcement learning components and systems

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

The software associated with a paper accepted at EMNLP 2021 titled "Open Knowledge Graphs Canonicalization using Variational Autoencoders".

Key information extraction from invoice document with Graph Convolution Network

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

Code for Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing(ICCV21)

Time Delayed NN implemented in pytorch

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"