Official implementation of Monocular Quasi-Dense 3D Object Tracking

Last update: Dec 20, 2022

Overview

Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D using quasi-dense object proposals from 2D images.

Monocular Quasi-Dense 3D Object Tracking,
Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun,
arXiv technical report (arXiv 2103.07351) Project Website (QD-3DT)

@article{Hu2021QD3DT,
    author = {Hu, Hou-Ning and Yang, Yung-Hsu and Fischer, Tobias and Yu, Fisher and Darrell, Trevor and Sun, Min},
    title = {Monocular Quasi-Dense 3D Object Tracking},
    journal = {ArXiv:2103.07351},
    year = {2021}
}

Abstract

A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer’s actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform. The object association leverages quasi-dense similarity learning to identify objects in various poses and viewpoints with appearance cues only. After initial 2D association, we further utilize 3D bounding boxes depth-ordering heuristics for robust instance association and motion-based 3D trajectory prediction for re-identification of occluded vehicles. In the end, an LSTM-based object velocity learning module aggregates the long-term trajectory information for more accurate motion extrapolation. Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios. On the Waymo Open benchmark, we establish the first camera-only baseline in the 3D tracking and 3D detection challenges. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark with near five times tracking accuracy of the best vision-only submission among all published methods.

Main results

3D tracking on nuScenes test set

We achieved the best vision-only submission

AMOTA	AMOTP
21.7	1.55

3D tracking on Waymo Open test set

We established the first camera-only baseline on Waymo Open

MOTA/L2	MOTP/L2
0.0001	0.0658

2D vehicle tracking on KITTI test set

MOTA	MOTP
86.44	85.82

Installation

Please refer to INSTALL.md for installation and to DATA.md dataset preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of QD-3DT.

MODEL ZOO

Please refer to MODEL_ZOO.md for reproducing the results on varients of benchmarks

Contact

This repo is currently maintained by Hou-Ning Hu (@eborboihuc), Yung-Hsu Yang (@RoyYang0714), and Tobias Fischer (@tobiasfshr).

License

This work is licensed under BSD 3-Clause License. See LICENSE for details. Third-party datasets and tools are subject to their respective licenses.

Acknowledgements

We thank Jiangmiao Pang for his help in providing the qdtrack codebase in mmdetection. This repo uses py-motmetrics for MOT evaluation, waymo-open-dataset for Waymo Open 3D detection and 3D tracking task, and nuscenes-devkit for nuScenes evaluation and preprocessing.

Official implementation of Monocular Quasi-Dense 3D Object Tracking

Related tags

Overview

Monocular Quasi-Dense 3D Object Tracking

Abstract

Main results

3D tracking on nuScenes test set

3D tracking on Waymo Open test set

2D vehicle tracking on KITTI test set

Installation

Get Started

MODEL ZOO

Contact

License

Acknowledgements

Owner

Visual Intelligence and Systems Group

A Transformer-Based Siamese Network for Change Detection

The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

UFPR-ADMR-v2 Dataset

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Pytorch implementation of our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.

Kindle is an easy model build package for PyTorch.

PyTorch reimplementation of REALM and ORQA

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Meta graph convolutional neural network-assisted resilient swarm communications

PyTorch implementation of SIFT descriptor

Predicting Tweet Sentiment Maching Learning and streamlit

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

i-RevNet Pytorch Code

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

code for Grapadora research paper experimentation