The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

Related tags

Deep Learningeqlv2
Overview

The Equalization Losses for Long-tailed Object Detection and Instance Segmentation

This repo is official implementation CVPR 2021 paper: Equalization Loss v2: A New Gradient Balance Approach for Long-tailed Object Detection and CVPR 2020 paper: Equalization loss for long-tailed object recognition

Besides the equalization losses, this repo also includes some other algorithms:

  • BAGS (Balance GroupSoftmax)
  • cRT (classifier re-training)
  • LWS (Learnable Weight Scaling)

Requirements

We test our codes on MMDetection V2.3, other versions should also be ok.

Prepare LVIS Dataset

for images

LVIS uses same images as COCO's, so you need to donwload COCO dataset at folder ($COCO), and link those train, val under folder lvis($LVIS).

mkdir -p data/lvis
ln -s $COCO/train $LVIS
ln -s $COCO/val $LVIS
ln -s $COCO/test $LVIS

for annotations

Download the annotations from lvis webset

cd $LVIS
mkdir annotations

then places the annotations at folder ($LVIS/annotations)

Finally you will have the file structure like below:

data
  ├── lvis
  |   ├── annotations
  │   │   │   ├── lvis_v1_val.json
  │   │   │   ├── lvis_v1_train.json
  │   ├── train2017
  │   │   ├── 000000004134.png
  │   │   ├── 000000031817.png
  │   │   ├── ......
  │   ├── val2017
  │   ├── test2017

for API

The official lvis-api and mmlvis can lead to some bugs of multiprocess. See issue

So you can install this LVIS API from my modified repo.

pip install git+https://github.com/tztztztztz/lvis-api.git

Testing with pretrain_models

# ./tools/dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
./tools/dist_test.sh configs/eqlv2/eql_r50_8x2_1x.py data/pretrain_models/eql_r50_8x2_1x.pth 8 --out results.pkl --eval bbox segm

Training

# ./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
./tools/dist_train.sh ./configs/end2end/eql_r50_8x2_1x.py 8 

Once you finished the training, you will get the evaluation metric like this:

bbox AP

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.242
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.401
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.254
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.181
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.317
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.367
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.135
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.225
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  f] = 0.308
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.331
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.223
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.308
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  f] = 0.415

mask AP

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.237
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=300 catIds=all] = 0.372
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=300 catIds=all] = 0.251
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.169
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.316
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.370
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.149
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.228
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  f] = 0.286
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=all] = 0.326
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     s | maxDets=300 catIds=all] = 0.210
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     m | maxDets=300 catIds=all] = 0.415
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=     l | maxDets=300 catIds=all] = 0.495
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  r] = 0.213
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  c] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 catIds=  f] = 0.389

We place ours configs file in ./configs/

  • ./configs/end2end: eqlv2 and other end2end methods
  • ./configs/decouple decoupled-based methods

How to train decouple training methods.

  1. Train the baseline model (or EQL v2).
  2. Prepare the pretrained checkpoint
  # suppose you've trained baseline model
  cd r50_1x
  python ../tools/ckpt_surgery.py --ckpt-path epoch_12.pth --method remove
  # if you want to train LWS, you should choose method 'reset'
  1. Start training with configs
  # ./tools/dist_train.sh ./configs/decouple/bags_r50_8x2_1x.py 8
  # ./tools/dist_train.sh ./configs/decouple/lws_r50_8x2_1x.py 8
  ./tools/dist_train.sh ./configs/decouple/crt_r50_8x2_1x.py 8

Pretrained Models on LVIS

Methods end2end AP APr APc APf pretrained_model
Baseline 16.1 0.0 12.0 27.4 model
EQL 18.6 2.1 17.4 27.2 model
RFS 22.2 11.5 21.2 28.0 model
LWS × 17.0 2.0 13.5 27.4 model
cRT × 22.1 11.9 20.2 29.0 model
BAGS × 23.1 13.1 22.5 28.2 model
EQLv2 23.7 14.9 22.8 28.6 model

How to train EQLv2 on OpenImages

1. Download the data

Download openimages v5 images from link, The folder will be

openimages
    ├── train
    ├── validation
    ├── test

Download the annotations for Challenge 2019 from link, The folder will be

annotations
    ├── challenge-2019-classes-description-500.csv
    ├── challenge-2019-train-detection-human-imagelabels.csv
    ├── challenge-2019-train-detection-bbox.csv
    ├── challenge-2019-validation-detection-bbox.csv
    ├── challenge-2019-validation-detection-human-imagelabels.csv
    ├── ...

2. Convert the .csv to coco-like .json file.

cd tools/openimages2coco/
python convert_annotations.py -p PATH_TO_OPENIMAGES --version challenge_2019 --task bbox 

You may need to donwload the data directory from https://github.com/bethgelab/openimages2coco/tree/master/data and place it at $project_dir/tools/openimages2coco/

3. Train models

  ./tools/dist_train.sh ./configs/openimages/eqlv2_r50_fpn_8x2_2x.py 8

Other configs can be found at ./configs/openimages/

4. Inference and output the json results file

./tools/dist_test.sh ./configs/openimages/eqlv2_r50_fpn_8x2_2x.py openimage_eqlv2_2x/epoch_1.pth 8 --format-only --options "jsonfile_prefix=openimage_eqlv2_2x/results"" 

Then you will get results.bbox.json under folder openimage_eqlv2

5. Convert coco-like json result file to openimage-like csv results file

cd $project_dir/tools/openimages2coco/
python convert_predictions.py -p ../../openimage_eqlv2/results.bbox.json --subset validation

Then you will get results.bbox.csv under folder openimage_eqlv2

6. Evaluate results file using official API

Please refer this link

After this, you will see something like this.

OpenImagesDetectionChallenge_Precision/[email protected],0.5263230244227198                                                                                                                     OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/061hd_',0.4198356678732905                                                                                             OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/06m11',0.40262261023434986                                                                                             OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/03120',0.5694096972722996                                                                                              OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/01kb5b',0.20532245532245533                                                                                            OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/0120dh',0.7934685035604202                                                                                             OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/0dv5r',0.7029194449221794                                                                                              OpenImagesDetectionChallenge_PerformanceByCategory/[email protected]/b'/m/0jbk',0.5959245714028935

7. Parse the AP file and output the grouped AP

cd $project_dir

PYTHONPATH=./:$PYTHONPATH python tools/parse_openimage_metric.py --file openimage_eqlv2_2x/metric

And you will get:

mAP 0.5263230244227198
mAP0: 0.4857693606436219
mAP1: 0.52047262478471
mAP2: 0.5304580597832517
mAP3: 0.5348747991854581
mAP4: 0.5588236678031849

Main Results on OpenImages

Methods AP AP1 AP2 AP3 AP4 AP5
Faster-R50 43.1 26.3 42.5 45.2 48.2 52.6
EQL 45.3 32.7 44.6 47.3 48.3 53.1
EQLv2 52.6 48.6 52.0 53.0 53.4 55.8
Faster-R101 46.0 29.2 45.5 49.3 50.9 54.7
EQL 48.0 36.1 47.2 50.5 51.0 55.0
EQLv2 55.1 51.0 55.2 56.6 55.6 57.5

Citation

If you use the equalization losses, please cite our papers.

@article{tan2020eqlv2,
  title={Equalization Loss v2: A New Gradient Balance Approach for Long-tailed Object Detection},
  author={Tan, Jingru and Lu, Xin and Zhang, Gang and Yin, Changqing and Li, Quanquan},
  journal={arXiv preprint arXiv:2012.08548},
  year={2020}
}
@inproceedings{tan2020equalization,
  title={Equalization loss for long-tailed object recognition},
  author={Tan, Jingru and Wang, Changbao and Li, Buyu and Li, Quanquan and Ouyang, Wanli and Yin, Changqing and Yan, Junjie},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11662--11671},
  year={2020}
}

Credits

The code for converting openimage to LVIS is from this repo.

Owner
Jingru Tan
Jingru Tan
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022
Dynamic Bottleneck for Robust Self-Supervised Exploration

Dynamic Bottleneck Introduction This is a TensorFlow based implementation for our paper on "Dynamic Bottleneck for Robust Self-Supervised Exploration"

Bai Chenjia 4 Nov 14, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Bin Xiao 175 Jan 08, 2023
Benchmarks for semi-supervised domain generalization.

Semi-Supervised Domain Generalization This code is the official implementation of the following paper: Semi-Supervised Domain Generalization with Stoc

Kaiyang 49 Dec 10, 2022
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition Usage First, install PyTorch 1.7.1+, torchvision 0.8.2

40 Dec 12, 2022
Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

Chao-Yuan Wu 479 Dec 26, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022
Constrained Language Models Yield Few-Shot Semantic Parsers

Constrained Language Models Yield Few-Shot Semantic Parsers This repository contains tools and instructions for reproducing the experiments in the pap

Microsoft 43 Nov 23, 2022
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation Zhaoyun Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li

DamoCV 25 Dec 16, 2022
This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637 Dependencies The model depends on the foll

Jörg Encke 2 Oct 14, 2022
Faster Convex Lipschitz Regression

Faster Convex Lipschitz Regression This reepository provides a python implementation of our Faster Convex Lipschitz Regression algorithm with GPU and

Ali Siahkamari 0 Nov 19, 2021
When in Doubt: Improving Classification Performance with Alternating Normalization

When in Doubt: Improving Classification Performance with Alternating Normalization Findings of EMNLP 2021 Menglin Jia, Austin Reiter, Ser-Nam Lim, Yoa

Menglin Jia 13 Nov 06, 2022
Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

CMT Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award) [Paper] [Site] Directory Struc

Zhaokai Wang 198 Dec 27, 2022
Adaptable tools to make reinforcement learning and evolutionary computation algorithms.

Pearl The Parallel Evolutionary and Reinforcement Learning Library (Pearl) is a pytorch based package with the goal of being excellent for rapid proto

38 Jan 01, 2023
LibFewShot: A Comprehensive Library for Few-shot Learning.

LibFewShot Make few-shot learning easy. Supported Methods Meta MAML(ICML'17) ANIL(ICLR'20) R2D2(ICLR'19) Versa(NeurIPS'18) LEO(ICLR'19) MTL(CVPR'19) M

<a href=[email protected]&L"> 603 Jan 05, 2023
Create images and texts with the First Order Generative Adversarial Networks

First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority

Zalando Research 35 Dec 11, 2021
BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition

Rui Qian 17 Dec 12, 2022