Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

Overview

SA-AutoAug

Scale-aware Automatic Augmentation for Object Detection

Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia

[Paper] [BibTeX]


This project provides the implementation for the CVPR 2021 paper "Scale-aware Automatic Augmentation for Object Detection". Scale-aware AutoAug provides a new search space and search metric to find effective data agumentation policies for object detection. It is implemented on maskrcnn-benchmark and FCOS. Both search and training codes have been released. To facilitate more use, we re-implement the training code based on Detectron2.

Installation

For maskrcnn-benchmark code, please follow INSTALL.md for instruction.

For FCOS code, please follow INSTALL.md for instruction.

For Detectron2 code, please follow INSTALL.md for instruction.

Search

(You can skip this step and directly train on our searched policies.)

To search with 8 GPUs, run:

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/search.py --config-file configs/SA_AutoAug/retinanet_R-50-FPN_search.yaml OURPUT_DIR /path/to/searchlog_dir

Since we finetune on an existing baseline model during search, a baseline model is needed. You can download this model for search, or you can use other Retinanet baseline model trained by yourself.

Training

To train the searched policies on maskrcnn-benchmark (FCOS)

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file configs/SA_AutoAug/CONFIG_FILE  OUTPUT_DIR /path/to/traininglog_dir

For example, to train the retinanet ResNet-50 model with our searched data augmentation policies in 6x schedule:

cd /path/to/SA-AutoAug/maskrcnn-benchmark
export NGPUS=8
python3 -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file configs/SA_AutoAug/retinanet_R-50-FPN_6x.yaml  OUTPUT_DIR models/retinanet_R-50-FPN_6x_SAAutoAug

To train the searched policies on detectron2

cd /path/to/SA-AutoAug/detectron2
python3 ./tools/train_net.py --num-gpus 8 --config-file ./configs/COCO-Detection/SA_AutoAug/CONFIG_FILE OUTPUT_DIR /path/to/traininglog_dir

For example, to train the retinanet ResNet-50 model with our searched data augmentation policies in 6x schedule:

cd /path/to/SA-AutoAug/detectron2
python3 ./tools/train_net.py --num-gpus 8 --config-file ./configs/COCO-Detection/SA_AutoAug/retinanet_R_50_FPN_6x.yaml OUTPUT_DIR output_retinanet_R_50_FPN_6x_SAAutoAug

Results

We provide the results on COCO val2017 set with pretrained models.

Based on maskrcnn-benchmark

Method Backbone APbbox Download
Faster R-CNN ResNet-50 41.8 Model
Faster R-CNN ResNet-101 44.2 Model
RetinaNet ResNet-50 41.4 Model
RetinaNet ResNet-101 42.8 Model
Mask R-CNN ResNet-50 42.8 Model
Mask R-CNN ResNet-101 45.3 Model

Based on FCOS

Method Backbone APbbox Download
FCOS ResNet-50 42.6 Model
FCOS ResNet-101 44.0 Model
ATSS ResNext-101-32x8d-dcnv2 48.5 Model
ATSS ResNext-101-32x8d-dcnv2 (1200 size) 49.6 Model

Based on Detectron2

Method Backbone APbbox Download
Faster R-CNN ResNet-50 41.9 Model - Metrics
Faster R-CNN ResNet-101 44.2 Model - Metrics
RetinaNet ResNet-50 40.8 Model - Metrics
RetinaNet ResNet-101 43.1 Model - Metrics
Mask R-CNN ResNet-50 42.9 Model - Metrics
Mask R-CNN ResNet-101 45.6 Model - Metrics

Citing SA-AutoAug

Consider cite SA-Autoaug in your publications if it helps your research.

@inproceedings{saautoaug,
  title={Scale-aware Automatic Augmentation for Object Detection},
  author={Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Acknowledgments

This training code of this project is built on maskrcnn-benchmark, Detectron2, FCOS, and ATSS. The search code of this project is modified from DetNAS. Some augmentation code and settings follow AutoAug-Det. We thanks a lot for the authors of these projects.

Note that:

(1) We also provides script files for search and training in maskrcnn-benchmark, FCOS, and, detectron2.

(2) Any issues or pull requests on this project are welcome. In addition, if you meet problems when applying the augmentations to other datasets or codebase, feel free to contact Yukang Chen ([email protected]).

Owner
DV Lab
Deep Vision Lab
DV Lab
City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces Paper Temporary GitHub page for City Surfaces paper. More soon! While designing s

14 Nov 10, 2022
Neural Tangent Generalization Attacks (NTGA)

Neural Tangent Generalization Attacks (NTGA) ICML 2021 Video | Paper | Quickstart | Results | Unlearnable Datasets | Competitions | Citation Overview

Chia-Hung Yuan 34 Nov 25, 2022
Centroid-UNet is deep neural network model to detect centroids from satellite images.

Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer

GIC-AIT 19 Dec 08, 2022
Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"

Visually Grounded Bert Language Model This repository is the official implementation of Explainable Semantic Space by Grounding Language to Vision wit

17 Dec 17, 2022
Code accompanying the NeurIPS 2021 paper "Generating High-Quality Explanations for Navigation in Partially-Revealed Environments"

Generating High-Quality Explanations for Navigation in Partially-Revealed Environments This work presents an approach to explainable navigation under

RAIL Group @ George Mason University 1 Oct 28, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The SpeechBrain Toolkit SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch. The goal is to create a single, flexible, and us

SpeechBrain 5.1k Jan 02, 2023
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Are Transformers More Robust Than CNNs? Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs? Our implementation is b

Yutong Bai 145 Dec 01, 2022
Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone

Andrew Jesson 19 Jun 23, 2022
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
Toontown House CT Edition

Toontown House: Classic Toontown House Classic source that should just work. ❓ W

Open Source Toontown Servers 5 Jan 09, 2022
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

ZJU3DV 1.4k Dec 30, 2022
SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss This repository implements the SAFL in pytorch. Installation conda env create -f environm

6 Aug 24, 2022
Data pipelines for both TensorFlow and PyTorch!

rapidnlp-datasets Data pipelines for both TensorFlow and PyTorch ! If you want to load public datasets, try: tensorflow/datasets huggingface/datasets

1 Dec 08, 2021
Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

HsuanKung Yang 83 Oct 13, 2022
Tree-based Search Graph for Approximate Nearest Neighbor Search

TBSG: Tree-based Search Graph for Approximate Nearest Neighbor Search. TBSG is a graph-based algorithm for ANNS based on Cover Tree, which is also an

Fanxbin 2 Dec 27, 2022
Code for the Convolutional Vision Transformer (ConViT)

ConViT : Vision Transformers with Convolutional Inductive Biases This repository contains PyTorch code for ConViT. It builds on code from the Data-Eff

Facebook Research 418 Jan 06, 2023
Text to image synthesis using thought vectors

Text To Image Synthesis Using Thought Vectors This is an experimental tensorflow implementation of synthesizing images from captions using Skip Though

Paarth Neekhara 2.1k Jan 05, 2023
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

253 Dec 24, 2022
PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

PASTRIE Official release of the corpus described in the paper: Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, and Nathan Schn

NERT @ Georgetown 4 Dec 02, 2021
PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

PEPit: Performance Estimation in Python This open source Python library provides a generic way to use PEP framework in Python. Performance estimation

Baptiste 53 Nov 16, 2022