A faster pytorch implementation of faster r-cnn

Overview

A Faster Pytorch Implementation of Faster R-CNN

Write at the beginning

[05/29/2020] This repo was initaited about two years ago, developed as the first open-sourced object detection code which supports multi-gpu training. It has been integrating tremendous efforts from many people. However, we have seen many high-quality repos emerged in the last years, such as:

At this point, I think this repo is out-of-data in terms of the pipeline and coding style, and will not maintain actively. Though you can still use this repo as a playground, I highly recommend you move to the above repos to delve into west world of object detection!

Introduction

💥 Good news! This repo supports pytorch-1.0 now!!! We borrowed some code and techniques from maskrcnn-benchmark. Just go to pytorch-1.0 branch!

This project is a faster pytorch implementation of faster R-CNN, aimed to accelerating the training of faster R-CNN object detection models. Recently, there are a number of good implementations:

During our implementing, we referred the above implementations, especailly longcw/faster_rcnn_pytorch. However, our implementation has several unique and new features compared with the above implementations:

  • It is pure Pytorch code. We convert all the numpy implementations to pytorch!

  • It supports multi-image batch training. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.

  • It supports multiple GPUs training. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.

  • It supports three pooling methods. We integrate three pooling methods: roi pooing, roi align and roi crop. More importantly, we modify all of them to support multi-image batch training.

  • It is memory efficient. We limit the image aspect ratio, and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 and VGG16 with batchsize = 4 (4 images) on a single Titan X (12 GB). When training with 8 GPU, the maximum batchsize for each GPU is 3 (Res101), totaling 24.

  • It is faster. Based on the above modifications, the training is much faster. We report the training speed on NVIDIA TITAN Xp in the tables below.

What we are doing and going to do

  • Support both python2 and python3 (great thanks to cclauss).
  • Add deformable pooling layer (mainly supported by Xander).
  • Support pytorch-0.4.0 (this branch).
  • Support tensorboardX.
  • Support pytorch-1.0 (go to pytorch-1.0 branch).

Other Implementations

Tutorial

Benchmarking

We benchmark our code thoroughly on three datasets: pascal voc, coco and visual genome, using two different network architectures: vgg16 and resnet101. Below are the results:

1). PASCAL VOC 2007 (Train/Test: 07trainval/07test, scale=600, ROI Align)

model   #GPUs batch size lr       lr_decay max_epoch     time/epoch mem/GPU mAP
VGG-16     1 1 1e-3 5   6   0.76 hr 3265MB   70.1
VGG-16     1 4 4e-3 9   0.50 hr 9083MB   69.6
VGG-16     8 16 1e-2  8   10 0.19 hr 5291MB 69.4
VGG-16     8 24 1e-2 10 11 0.16 hr 11303MB 69.2
Res-101 1 1 1e-3 5 7 0.88 hr 3200 MB 75.2
Res-101   1 4 4e-3 8   10 0.60 hr 9700 MB 74.9
Res-101   8 16 1e-2 8   10 0.23 hr 8400 MB 75.2 
Res-101   8 24 1e-2 10 12 0.17 hr 10327MB 75.1  

2). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=800, max_size=1200, ROI Align)

model #GPUs batch size lr lr_decay max_epoch time/epoch mem/GPU mAP
VGG-16     8 16   1e-2 4 6 4.9 hr 7192 MB 29.2
Res-101   8 16   1e-2 4   6 6.0 hr 10956 MB 36.2
Res-101   8 16   1e-2 4   10 6.0 hr 10956 MB 37.0

NOTE. Since the above models use scale=800, you need add "--ls" at the end of test command.

3). COCO (Train/Test: coco_train+coco_val-minival/minival, scale=600, max_size=1000, ROI Align)

model #GPUs batch size lr lr_decay max_epoch time/epoch mem/GPU mAP
Res-101   8 24   1e-2 4   6 5.4 hr   10659 MB 33.9
Res-101   8 24   1e-2 4   10 5.4 hr   10659 MB 34.5

4). Visual Genome (Train/Test: vg_train/vg_test, scale=600, max_size=1000, ROI Align, category=2500)

model #GPUs batch size lr lr_decay max_epoch time/epoch mem/GPU mAP
VGG-16   1 P100 4   1e-3 5   20 3.7 hr   12707 MB 4.4

Thanks to Remi for providing the pretrained detection model on visual genome!

  • Click the links in the above tables to download our pre-trained faster r-cnn models.
  • If not mentioned, the GPU we used is NVIDIA Titan X Pascal (12GB).

Preparation

First of all, clone the code

git clone https://github.com/jwyang/faster-rcnn.pytorch.git

Then, create a folder:

cd faster-rcnn.pytorch && mkdir data

prerequisites

  • Python 2.7 or 3.6
  • Pytorch 0.4.0 (now it does not support 0.4.1 or higher)
  • CUDA 8.0 or higher

Data Preparation

  • PASCAL_VOC 07+12: Please follow the instructions in py-faster-rcnn to prepare VOC datasets. Actually, you can refer to any others. After downloading the data, create softlinks in the folder data/.

  • COCO: Please also follow the instructions in py-faster-rcnn to prepare the data.

  • Visual Genome: Please follow the instructions in bottom-up-attention to prepare Visual Genome dataset. You need to download the images and object annotation files first, and then perform proprecessing to obtain the vocabulary and cleansed annotations based on the scripts provided in this repository.

Pretrained Model

We used two pretrained models in our experiments, VGG and ResNet101. You can download these two models from:

Download them and put them into the data/pretrained_model/.

NOTE. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Compilation

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch in make.sh file, to compile the cuda code:

GPU model Architecture
TitanX (Maxwell/Pascal) sm_52
GTX 960M sm_50
GTX 1080 (Ti) sm_61
Grid K520 (AWS g2.2xlarge) sm_30
Tesla K80 (AWS p2.xlarge) sm_37

More details about setting the architecture can be found here or here

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
sh make.sh

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.

Train

Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train a faster R-CNN model with vgg16 on pascal_voc, simply run:

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                   --dataset pascal_voc --net vgg16 \
                   --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                   --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                   --cuda

where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                    --dataset pascal_voc --net res101 \
                    --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                    --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                    --cuda

Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.

If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:

python trainval_net.py --dataset pascal_voc --net vgg16 \
                       --bs 24 --nw 8 \
                       --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                       --cuda --mGPUs

Change dataset to "coco" or 'vg' if you want to train on COCO or Visual Genome.

Test

If you want to evaluate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run

python test_net.py --dataset pascal_voc --net vgg16 \
                   --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
                   --cuda

Specify the specific model session, checkepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.

Demo

If you want to run detection on your own images with a pre-trained model, download the pretrained model listed in above tables or train your own models at first, then add images to folder $ROOT/images, and then run

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy

Then you will find the detection results in folder $ROOT/images.

Note the default demo.py merely support pascal_voc categories. You need to change the line to adapt your own model.

Below are some detection results:

Webcam Demo

You can use a webcam in a real-time demo by running

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy \
               --webcam $WEBCAM_ID

The demo is stopped by clicking the image window and then pressing the 'q' key.

Authorship

This project is equally contributed by Jianwei Yang and Jiasen Lu, and many others (thanks to them!).

Citation

@article{jjfaster2rcnn,
    Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},
    Title = {A Faster Pytorch Implementation of Faster R-CNN},
    Journal = {https://github.com/jwyang/faster-rcnn.pytorch},
    Year = {2017}
}

@inproceedings{renNIPS15fasterrcnn,
    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}
}
Owner
Jianwei Yang
Senior Researcher @ Microsoft
Jianwei Yang
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 5k Jan 04, 2023
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

109 Dec 02, 2022
Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

valeo.ai 15 Dec 22, 2022
Anime Face Detector using mmdet and mmpose

Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model

198 Jan 07, 2023
Good Classification Measures and How to Find Them

Good Classification Measures and How to Find Them This repository contains supplementary materials for the paper "Good Classification Measures and How

Yandex Research 7 Nov 13, 2022
Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

Artificial and Mechanical Intelligence 14 Nov 05, 2022
The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store dev

George Rocha 0 Feb 03, 2022
Code for MSc Quantitative Finance Dissertation

MSc Dissertation Code ReadMe Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks Curtis Nybo MSc Quantitative F

2 Dec 01, 2022
Official PyTorch implementation of PS-KD

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD) Accepted at ICCV 2021, oral presentation Official PyTorch implementation of

61 Dec 28, 2022
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Invariant Point Attention - Pytorch Implementation of Invariant Point Attention as a standalone module, which was used in the structure module of Alph

Phil Wang 113 Jan 05, 2023
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

John 9 Sep 18, 2022
Have you ever wondered how cool it would be to have your own A.I

Have you ever wondered how cool it would be to have your own A.I. assistant Imagine how easier it would be to send emails without typing a single word, doing Wikipedia searches without opening web br

Harsh Gupta 1 Nov 09, 2021
[ICLR'21] Counterfactual Generative Networks

This repository contains the code for the ICLR 2021 paper "Counterfactual Generative Networks" by Axel Sauer and Andreas Geiger. If you want to take the CGN for a spin and generate counterfactual ima

88 Jan 02, 2023
Materials for upcoming beginner-friendly PyTorch course (work in progress).

Learn PyTorch for Deep Learning (work in progress) I'd like to learn PyTorch. So I'm going to use this repo to: Add what I've learned. Teach others in

Daniel Bourke 2.3k Dec 29, 2022
Systematic generalisation with group invariant predictions

Requirements are Python 3, TensorFlow v1.14, Numpy, Scipy, Scikit-Learn, Matplotlib, Pillow, Scikit-Image, h5py, tqdm. Experiments were run on V100 GPUs (16 and 32GB).

Faruk Ahmed 30 Dec 01, 2022
BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer Accepted as Poster in BMVC 2021 This is an official implementation in P

Sauradip Nag 14 Dec 09, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 02, 2022