Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

⚠️ Latest: Current repo is a complete version. But we delete many redundant codes and are still under testing now.

This repo is the official implementation for CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. [Paper] [Supp] [Slides] [Video] [Code and models]

Note: Current code is still not very clean yet. We are still working on it, and it will be updated soon.

Requirements

1. Environment:

The requirements are exactly the same as mmdetection v1.0.rc0. We tested on on the following settings:

python 3.7
cuda 9.2
pytorch 1.3.1+cu92
torchvision 0.4.2+cu92
mmcv 0.2.14

HH=`pwd`
conda create -n mmdet python=3.7 -y
conda activate mmdet

pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install mmcv
pip install matplotlib
pip install terminaltables

cd lvis-api/
python setup.py develop

cd $HH
python setup.py develop

2. Data:

a. For dataset images:

# Make sure you are in dir BalancedGroupSoftmax

mkdir data
cd data
mkdir lvis
mkdir pretrained_models

If you already have COCO2017 dataset, it will be great. Link train2017 and val2017 folders under folder lvis.
If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder lvis.

b. For dataset annotations:

Download lvis annotations: lvis train ann and lvis val ann.
Unzip all the files and put them under lvis,

To train HTC models, download COCO stuff annotations and change the name of folder stuffthingmaps_trainval2017 to stuffthingmaps.

c. For pretrained models:

Download the corresponding pre-trained models below.

To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
To train balanced group softmax models (shorted as gs models), we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer.
Move these model files to ./data/pretrained_models/

d. For intermediate files (for BAGS and reweight models only):

You can either donwnload or generate them before training and testing. Put them under ./data/lvis/.

BAGS models: label2binlabel.pt, pred_slice_with0.pt, valsplit.pkl
Re-weight models: cls_weight.pt, cls_weight_bours.pt
RFS models: class_to_imageid_and_inscount.pt

After all these operations, the folder data should be like this:

    data
    ├── lvis
    │   ├── lvis_v0.5_train.json
    │   ├── lvis_v0.5_val.json
    │   ├── stuffthingmaps (Optional, for HTC models only)
    │   ├── label2binlabel.pt (Optional, for GAGS models only)
    │   ├── ...... (Other intermidiate files)
    │   │   ├── train2017
    │   │   │   ├── 000000004134.png
    │   │   │   ├── 000000031817.png
    │   │   │   ├── ......
    │   │   └── val2017
    │   │       ├── 000000424162.png
    │   │       ├── 000000445999.png
    │   │       ├── ......
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......
    └── pretrained_models
        ├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
        ├── ......

Training

Note: Please make sure that you have prepared the pre-trained models and intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to train a model.

# Single GPU
python tools/train.py ${CONFIG_FILE}

# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

All config files are under ./configs/.

./configs/bags: all models for Balanced Group Softmax.
./configs/baselines: all baseline models.
./configs/transferred: transferred models from long-tail image classification.
./configs/ablations: models for ablation study.

For example, to train a BAGS model with Faster R-CNN R50-FPN:

# Single GPU
python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py

# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py 8

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)

Testing

Note: Please make sure that you have prepared the intermediate files and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to test a trained model.

# single gpu test
python tools/test_lvis.py \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test_lvis.sh \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

$RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.

$EVAL_METRICS: Items to be evaluated on the results. bbox for bounding box evaluation only. bbox segm for bounding box and mask evaluation.

For example (assume that you have downloaded the corresponding model file to ./data/downloaded_models):

To evaluate the trained BAGS model with Faster R-CNN R50-FPN for object detection:

# single-gpu testing
python tools/test_lvis.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
 ./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth \
  --out gs_box_result.pkl --eval bbox

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth 8 \
--out gs_box_result.pkl --eval bbox

To evaluate the trained BAGS model with Mask R-CNN R50-FPN for instance segmentation:

# single-gpu testing
python tools/test_lvis.py configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
 ./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth \
  --out gs_mask_result.pkl --eval bbox segm

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth 8 \
--out gs_mask_result.pkl --eval bbox segm

The evaluation results will be shown in markdown table format:

| Type | IoU | Area | MaxDets | CatIds | Result |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  (AP)  | 0.50:0.95 |    all | 300 |          all | 25.96% |
|  (AP)  | 0.50      |    all | 300 |          all | 43.58% |
|  (AP)  | 0.75      |    all | 300 |          all | 27.15% |
|  (AP)  | 0.50:0.95 |      s | 300 |          all | 20.26% |
|  (AP)  | 0.50:0.95 |      m | 300 |          all | 32.81% |
|  (AP)  | 0.50:0.95 |      l | 300 |          all | 40.10% |
|  (AP)  | 0.50:0.95 |    all | 300 |            r | 17.66% |
|  (AP)  | 0.50:0.95 |    all | 300 |            c | 25.75% |
|  (AP)  | 0.50:0.95 |    all | 300 |            f | 29.55% |
|  (AR)  | 0.50:0.95 |    all | 300 |          all | 34.76% |
|  (AR)  | 0.50:0.95 |      s | 300 |          all | 24.77% |
|  (AR)  | 0.50:0.95 |      m | 300 |          all | 41.50% |
|  (AR)  | 0.50:0.95 |      l | 300 |          all | 51.64% |

Results and models

The main results on LVIS val set:

Models:

Please refer to our paper and supp for more details.

ID	Models	bbox mAP / mask mAP	Train	Test	Config file	Pretrained Model	Train part	Model
(1)	Faster R50-FPN	20.98	√	√	file	COCO R50	All	Google drive
(2)	x2	21.93	√	√	file	Model (1)	All	Google drive
(3)	Finetune tail	22.28	×	√	file	Model (1)	All	Google drive
(4)	RFS	23.41	√	√	file	COCO R50	All	Google drive
(5)	RFS-finetune	22.66	√	√	file	Model (1)	All	Google drive
(6)	Re-weight	23.48	√	√	file	Model (1)	All	Google drive
(7)	Re-weight-cls	24.66	√	√	file	Model (1)	Cls	Google drive
(8)	Focal loss	11.12	×	√	file	Model (1)	All	Google drive
(9)	Focal loss-cls	19.29	×	√	file	Model (1)	Cls	Google drive
(10)	NCM-fc	16.02	×	×		Model (1)
(11)	NCM-conv	12.56	×	×		Model (1)
(12)	$\tau$-norm	11.01	×	×		Model (1)	Cls
(13)	$\tau$-norm-select	21.61	×	×		Model (1)	Cls
(14)	Ours (Faster R50-FPN)	25.96	√	√	file	Model (1)	Cls	Google drive
(15)	Faster X101-64x4d	24.63	√	√	file	COCO x101	All	Google drive
(16)	Ours (Faster X101-64x4d)	27.83	√	√	file	Model (15)	Cls	Google drive
(17)	Cascade X101-64x4d	27.16	√	√	file	COCO cascade x101	All	Google drive
(18)	Ours (Cascade X101-64x4d)	32.77	√	√	file	Model (17)	Cls	Google drive
(19)	Mask R50-FPN	20.78/20.68	√	√	file	COCO mask r50	All	Google drive
(20)	Ours (Mask R50-FPN)	25.76/26.25	√	√	file	Model (19)	Cls	Google drive
(21)	HTC X101-64x4d	31.28/29.28	√	√	file	COCO HTC x101	All	Google drive
(22)	Ours (HTC X101-64x4d)	33.68/31.20	√	√	file	Model (21)	Cls	Google drive
(23)	HTC X101-64x4d-MS-DCN	34.61/31.94	√	√	file	COCO HTC x101-ms-dcn	All	Google drive
(24)	Ours (HTC X101-64x4d-MS-DCN)	37.71/34.39	√	√	file	Model (23)	Cls	Google drive

PS: in column Pretrained Model, the file of Model (n) is the same as the Google drive file in column Model in row (n).

Citation

@inproceedings{li2020overcoming,
  title={Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax},
  author={Li, Yu and Wang, Tao and Kang, Bingyi and Tang, Sheng and Wang, Chunfeng and Li, Jintao and Feng, Jiashi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10991--11000},
  year={2020}
}

Credit

This code is largely based on mmdetection v1.0.rc0 and LVIS API.

CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Related tags

Overview

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Requirements

1. Environment:

2. Data:

a. For dataset images:

b. For dataset annotations:

c. For pretrained models:

d. For intermediate files (for BAGS and reweight models only):

Training

Testing

Results and models

The main results on LVIS val set:

Models:

Citation

Credit

Owner

FishYuLi

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

PyElastica is the Python implementation of Elastica, an open-source software for the simulation of assemblies of slender, one-dimensional structures using Cosserat Rod theory.

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

Official Repository of NeurIPS2021 paper: PTR

SysWhispers Shellcode Loader

I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3 using OpenCV and Mediapipe library. Gestures :- Index Finger for drawing and Index+Middle Finger for changing position and objects.

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Implementation of the HMAX model of vision in PyTorch

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

Prometheus exporter for Cisco Unified Computing System (UCS) Manager

The Adapter-Bot: All-In-One Controllable Conversational Model

[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

Style transfer, deep learning, feature transform

Robust Lane Detection via Expanded Self Attention (WACV 2022)

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Using knowledge-informed machine learning on the PRONOSTIA (FEMTO) and IMS bearing data sets. Predict remaining-useful-life (RUL).

Vision-and-Language Navigation in Continuous Environments using Habitat