Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Last update: Oct 07, 2022

Related tags

Overview

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

abstract:Unlike 2D object detection where all RoI features come from grid pixels, the RoI feature extraction of 3D point cloud object detection is more diverse. In this paper, we first compare and analyze the differences in structure and performance between the two state-of-the-art models PV-RCNN and Voxel-RCNN. Then, we find that the performance gap between the two models does not come from point information, but structural information. The voxel features contain more structural information because they do quantization instead of downsampling to point cloud so that they can contain basically the complete information of the whole point cloud. The stronger structural information in voxel features makes the detector have higher performance in our experiments even if the voxel features don't have accurate location information. Then, we propose that structural information is the key to 3D object detection. Based on the above conclusion, we propose a Self-Attention RoI Feature Extractor (SARFE) to enhance structural information of the feature extracted from 3D proposals. SARFE is a plug-and-play module that can be easily used on existing 3D detectors. Our SARFE is evaluated on both KITTI dataset and Waymo Open dataset. With the newly introduced SARFE, we improve the performance of the state-of-the-art 3D detectors by a large margin in \textit{cyclist} on KITTI dataset while keeping real-time capability.

The source code will be published after the paper has been accepted to a conference.

Full paper

AP on KITTI Dataset

Submission link

AP on Waymo Open Dataset

Submission link

License

This code is released under the Apache 2.0 license.

Acknowledge

Our code are mainly based on OpenPCDet, thanks for their contributions!

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Related tags

Overview

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

AP on KITTI Dataset

AP on Waymo Open Dataset

License

Acknowledge

Owner

DK. Zhang

Object-aware Contrastive Learning for Debiased Scene Representation

Implementation of "Deep Implicit Templates for 3D Shape Representation"

Car Price Predictor App used to predict the price of the car based on certain input parameters created using python's scikit-learn, fastapi, numpy and joblib packages.

Planar Prior Assisted PatchMatch Multi-View Stereo

PConv-Keras - Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

Pytorch tutorials for Neural Style transfert

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Texture mapping with variational auto-encoders

An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

GeoTransformer - Geometric Transformer for Fast and Robust Point Cloud Registration

TensorFlow Implementation of "Show, Attend and Tell"

TransGAN: Two Transformers Can Make One Strong GAN

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

A unified 3D Transformer Pipeline for visual synthesis

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency