Voxel Transformer for 3D object detection

Last update: Dec 25, 2022

Related tags

Deep Learning VOTR

Overview

Voxel Transformer

This is a reproduced repo of Voxel Transformer for 3D object detection.

The code is mainly based on OpenPCDet.

Introduction

We provide code and training configurations of VoTr-SSD/TSD on the KITTI and Waymo Open dataset. Checkpoints will not be released.

Important Notes: VoTr generally requires quite a long time (more than 60 epochs on Waymo) to converge, and a large GPU memory (32Gb) is needed for reproduction. Please strictly follow the instructions and train with sufficient number of epochs. If you don't have a 32G GPU, you can decrease the attention SIZE parameters in yaml files, but this may possibly harm the performance.

Requirements

The codes are tested in the following environment:

Ubuntu 18.04
Python 3.6
PyTorch 1.5
CUDA 10.1
OpenPCDet v0.3.0
spconv v1.2.1

Installation

a. Clone this repository.

git clone https://github.com/PointsCoder/VOTR.git

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install -r requirements.txt

Install the SparseConv library, we use the implementation from [spconv].
- If you use PyTorch 1.1, then make sure you install the spconv v1.0 with (commit 8da6f96) instead of the latest one.
- If you use PyTorch 1.3+, then you need to install the spconv v1.2. As mentioned by the author of spconv, you need to use their docker if you use PyTorch 1.4+.

c. Compile CUDA operators by running the following command:

python setup.py develop

Training

All the models are trained with Tesla V100 GPUs (32G). The KITTI config of votr_ssd is for training with a single GPU. Other configs are for training with 8 GPUs. If you use different number of GPUs for training, it's necessary to change the respective training epochs to attain a decent performance.

The performance of VoTr is quite unstable on KITTI. If you cannnot reproduce the results, remember to run it multiple times.

models

# votr_ssd.yaml: single-stage votr backbone replacing the spconv backbone
# votr_tsd.yaml: two-stage votr with pv-head

training votr_ssd on kitti

CUDA_VISIBLE_DEVICES=0 python train.py --cfg_file cfgs/kitti_models/votr_ssd.yaml

training other models

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_train.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml

testing

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_test.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml --eval_all

Citation

If you find this project useful in your research, please consider cite:

@article{mao2021voxel,
  title={Voxel Transformer for 3D Object Detection},
  author={Mao, Jiageng and Xue, Yujing and Niu, Minzhe and others},
  journal={ICCV},
  year={2021}
}

Voxel Transformer for 3D object detection

Related tags

Overview

Voxel Transformer

Introduction

Requirements

Installation

Training

Citation

Owner

Eth brownie struct encoding example

A mini-course offered to Undergrad chemistry students

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

COCO Style Dataset Generator GUI

Encode and decode text application

Code for "Layered Neural Rendering for Retiming People in Video."

Split Variational AutoEncoder

Determined: Deep Learning Training Platform

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

PyTorch code for training MM-DistillNet for multimodal knowledge distillation

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

K-FACE Analysis Project on Pytorch

Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

SANet: A Slice-Aware Network for Pulmonary Nodule Detection

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.