Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Last update: Jan 02, 2023

Overview

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection, CVPR 2021

Abhinav Kumar, Garrick Brazil, Xiaoming Liu

[project], [supp], [slides], [1min_talk], demo, arxiv

This code is based on Kinematic-3D, such that the setup/organization is very similar. A few of the implementations, such as classical NMS, are based on Caffe.

References

Please cite the following paper if you find this repository useful:

@inproceedings{kumar2021groomed,
  title={{GrooMeD-NMS}: Grouped Mathematically Differentiable NMS for Monocular {$3$D} Object Detection},
  author={Kumar, Abhinav and Brazil, Garrick and Liu, Xiaoming},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Setup

Requirements
1. Python 3.6
2. Pytorch 0.4.1
3. Torchvision 0.2.1
4. Cuda 8.0
5. Ubuntu 18.04/Debian 8.9
This is tested with NVIDIA 1080 Ti GPU. Other platforms have not been tested. Unless otherwise stated, the below scripts and instructions assume the working directory is the project root.

Clone the repo first:
```
git clone https://github.com/abhi1kumar/groomed_nms.git
```

Cuda & Python

Install some basic packages:

sudo apt-get install libopenblas-dev libboost-dev libboost-all-dev git
sudo apt install gfortran

# We need to compile with older version of gcc and g++
sudo apt install gcc-5 g++-5
sudo ln -f /usr/bin/gcc-5 /usr/local/cuda-8.0/bin/gcc
sudo ln -s /usr/bin/g++-5 /usr/local/cuda-8.0/bin/g++

Next, install conda and then install the required packages:

wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
bash Anaconda3-2020.02-Linux-x86_64.sh
source ~/.bashrc
conda list
conda create --name py36 --file dependencies/conda.txt
conda activate py36

KITTI Data

Download the following images of the full KITTI 3D Object detection dataset:

left color images of object data set (12 GB)
camera calibration matrices of object data set (16 MB)
training labels of object data set (5 MB)

Then place a soft-link (or the actual data) in data/kitti:

 ln -s /path/to/kitti data/kitti

The directory structure should look like this:

./groomed_nms
|--- cuda_env
|--- data
|      |---kitti
|            |---training
|            |        |---calib
|            |        |---image_2
|            |        |---label_2
|            |
|            |---testing
|                     |---calib
|                     |---image_2
|
|--- dependencies
|--- lib
|--- models
|--- scripts

Then, use the following scripts to extract the data splits, which use soft-links to the above directory for efficient storage:

python data/kitti_split1/setup_split.py
python data/kitti_split2/setup_split.py

Next, build the KITTI devkit eval:

 sh data/kitti_split1/devkit/cpp/build.sh

Classical NMS

Lastly, build the classical NMS modules:
```
cd lib/nms
make
cd ../..
```

Training

Training is carried out in two stages - a warmup and a full. Review the configurations in scripts/config for details.

chmod +x scripts_training.sh
./scripts_training.sh

If your training is accidentally stopped, you can resume at a checkpoint based on the snapshot with the restore flag. For example, to resume training starting at iteration 10k, use the following command:

source dependencies/cuda_8.0_env
CUDA_VISIBLE_DEVICES=0 python -u scripts/train_rpn_3d.py --config=groumd_nms --restore=10000

Testing

We provide logs/models/predictions for the main experiments on KITTI Val 1/Val 2/Test data splits available to download here.

Make an output folder in the project directory:

mkdir output

Place different models in the output folder as follows:

./groomed_nms
|--- output
|      |---groumd_nms
|      |
|      |---groumd_nms_split2
|      |
|      |---groumd_nms_full_train_2
|
| ...

To test, run the file as below:

chmod +x scripts_evaluation.sh
./scripts_evaluation.sh

Contact

For questions, feel free to post here or drop an email to this address- [email protected]

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

169 Dec 6, 2022

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

Comments

Is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

Hi~thanks for your great work. However, I have some confusion in understanding the motivation of this algorithm. If we want to achieve the consistency of training and test, we can simply penalize the highest-confidence proposal in the training pipeline, which seems to achieve similar result.So, is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

opened by kaixinbear 3
Problem in test

Hi, this is an exciting work.And i have a question when I try to test with the pre-train model. I can't find "Kinematic3D-Release/val1_kinematic/model_final".

opened by chenH20000109 1

Releases(v0.1)

v0.1(Mar 30, 2021)

First Release of GrooMeD-NMS
Source code(tar.gz)
Source code(zip)

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Related tags

Overview

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

References

Setup

Training

Testing

Contact

You might also like...

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Categorical Depth Distribution Network for Monocular 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection

Comments

Is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

Problem in test

Releases(v0.1)

v0.1(Mar 30, 2021)

Owner

Abhinav Kumar

Official DGL implementation of "Rethinking High-order Graph Convolutional Networks"

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

Code samples for my book "Neural Networks and Deep Learning"

A DeepStack custom model for detecting common objects in dark/night images and videos.

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Codes for the compilation and visualization examples to the HIF vegetation dataset

Attention-guided gan for synthesizing IR images

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Generate images from texts. In Russian. In PaddlePaddle

PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training”

Chainer Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

You Only Look Once for Panopitic Driving Perception

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Semi-supevised Semantic Segmentation with High- and Low-level Consistency

Playable Video Generation

Github Traffic Insights as Prometheus metrics.