Autonomous Perception: 3D Object Detection with Complex-YOLO

Overview

Autonomous Perception: 3D Object Detection with Complex-YOLO

Gif of 50 frames of darknet

LiDAR object detection with Complex-YOLO takes four steps:

  1. Computing LiDAR point-clouds from range images.
  2. Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library (PCL).
  3. Using both Complex-YOLO Darknet and Resnet to predict 3D dectections on transformed LiDAR images.
  4. Evaluating the detections based Precision and Recall.

Complex-Yolo Pipeline

Complex-Yolo is both highly accurate and highly performant in production:

Complex-Yolo Performance

Computing LiDAR Point-Clouds from Waymo Range Images

Waymo uses multiple sensors including LiDAR, cameras, radar for autonomous perception. Even microphones are used to help detect ambulance and police sirens.

Visualizing LiDAR Range and Intensity Channels

LiDAR visualization 1

Roof-mounted "Top" LiDAR rotates 360 degrees with a vertical field of vision or ~20 degrees (-17.6 degrees to +2.4 degrees) with a 75m limit in the dataset.

LiDAR data is stored as a range image in the Waymo Open Dataset. Using OpenCV and NumPy, we filtered the "range" and "intensity" channels from the image, and converted the float data to 8-bit unsigned integers. Below is a visualization of two video frames, where the top half is the range channel, and the bottom half is the intensity for each visualization:

LiDAR visualization 2

Visualizing th LiDAR Point-cloud

There are 64 LEDs in Waymo's top LiDAR sensor. Limitations of 360 LiDAR include the space between beams (aka resolution) widening with distance from the origin. Also the car chasis will create blind spots, creating the need for Perimeter LiDAR sensors to be inlcuded on the sides of the vehicles.

We leveraged the Open3D library to make a 3D interactive visualization of the LiDAR point-cloud. Commonly visible features are windshields, tires, and mirros within 40m. Beyond 40m, cars are like slightly rounded rectangles where you might be able to make ou the windshield. Further away vehicles and extremely close vehicles typically have lower resolution, as well as vehicles obstructing the detection of other vehicles.

10 Vehicles Showing Different Types of LiDAR Interaction:

  1. Truck with trailer - most of truck is high resolution visible, but part of the trailer is in the 360 LiDAR's blind-spot.
  2. Car partial in blind spot, back-half isn't picked up well. This car blocks the larges area behind it from being detected by the LiDAR.
  3. Car shape is higly visible, where you can even see the side-mirrors and the LiDAR passing through the windshield.
  4. Car driving in other lane. You can see the resolution of the car being lower because the further away the 64 LEDs project the lasers, the futher apart the points of the cloud will be. It is also obstructed from some lasers by Car 2.
  5. This parked is unobstructed, but far enough away where it's difficult to make our the mirrors or the tires.
  6. Comparing this car to Car 3, you can see where most of the definition is either there or slightly worse, because it is further way.
  7. Car 7 is both far away and obstructed, so you can barely tell it's a car. It's basically a box with probably a windshield.
  8. Car 8 is similar to Car 6 on the right side, but obstructed by Car 6 on the left side.
  9. Car 9 is at the limit of the LiDAR's dataset's perception. It's hard to tell it's a car.
  10. Car 10 is at the limit of the LiDAR's perception, and is also obstructed by car 8.

Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library

Convert sensor coordinates to Bird's-Eye View map coordinates

The birds-eye view (BEV) of a LiDAR point-cloud is based on the transformation of the x and y coordinates of the points.

BEV map properties:

  • Height:

    H_{i,j} = max(P_{i,j} \cdot [0,0,1]T)

  • Intensity:

    I_{i,j} = max(I(P_{i,j}))

  • Density:

    D_{i,j} = min(1.0,\ \frac{log(N+1)}{64})

P_{i,j} is the set of points that falls into each cell, with i,j as the respective cell coordinates. N_{i,j} refers to the number of points in a cell.

Compute intensity layer of the BEV map

We created a BEV map of the "intensity" channel from the point-cloud data. We identified the top-most (max height) point with the same (x,y)-coordinates from the point-cloud, and assign the intensity value to the corresponding BEV map point. The data was normalized and outliers were removed until the features of interest were clearly visible.

Compute height layer of the BEV map

This is a visualization of the "height" channel BEV map. We sorted and pruned point-cloud data, normalizing the height in each BEV map pixel by the difference between max. and min.

Model-based Object Detection in BEV Image

We used YOLO3 and Resnet deep-learning models to doe 3D Object Detection. Complex-YOLO: Real-time 3D Object Detection on Point Clouds and Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds.

Extract 3D bounding boxes from model response

The models take a three-channel BEV map as an input, and predict the class about coordinates of objects (vehicles). We then transformed these BEV coordinates back to the vehicle coordinate-space to draw the bounding boxes in both images.

Transforming back to vehicle space

Below is a gif the of detections in action: Results from 50 frames of resnet detection

Performance Evaluation for Object Detection

Compute intersection-over-union between labels and detections

Based on the labels within the Waymo Open Dataset, your task is to compute the geometrical overlap between the bounding boxes of labels and detected objects and determine the percentage of this overlap in relation to the area of the bounding boxes. A default method in the literature to arrive at this value is called intersection over union, which is what you will need to implement in this task.

After detections are made, we need a set of metrics to measure our progress. Common classification metrics for object detection include:

TP, FN, FP

  • TP: True Positive - Predicts vehicle or other object is there correctly
  • TN: True Negative - Correctly predicts vehicle or object is not present
  • FP: False Positive - Dectects object class incorrectly
  • FN: False Negative - Didn't detect object class when there should be a dectection

One popular method of making these determinations is measuring the geometric overlap of bounding boxes vs the total area two predicted bounding boxes take up in an image, or th Intersecion over Union (IoU).

IoU formula

IoU for Complex-Yolo

Classification Metrics Based on Precision and Recall

After all the LiDAR and Camera data has been transformed, and the detections have been predicted, we calculate the following metrics for the bounding box predictions:

Formulas

  • Precision:

    \frac{TP}{TP + FP}

  • Recall:

    \frac{TP}{TP + FN}

  • Accuracy:

    \frac{TP + TN}{TP + TN + FP + FN}

  • Mean Average Precision:

    \frac{1}{n} \sum_{Recall_{i}}Precision(Recall_{i})

Precision and Recall Results Visualizations

Results from 50 frames: Results from 50 frames

Precision: .954 Recall: .921

Complex Yolo Paper

Owner
Thomas Dunlap
Machine Learning Engineer and Data Scientist with a focus on deep learning, computer vision, and robotics.
Thomas Dunlap
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Ro

Meta Research 1.2k Jan 02, 2023
This is the repo for Uncertainty Quantification 360 Toolkit.

UQ360 The Uncertainty Quantification 360 (UQ360) toolkit is an open-source Python package that provides a diverse set of algorithms to quantify uncert

International Business Machines 207 Dec 30, 2022
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

61 Jan 07, 2023
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Dec 27, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Gas detection Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors. Description The MQ-2 sensor can detect multiple gases (CO, H2, CH4, LPG,

Filip Š 15 Sep 30, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
NIMA: Neural IMage Assessment

PyTorch NIMA: Neural IMage Assessment PyTorch implementation of Neural IMage Assessment by Hossein Talebi and Peyman Milanfar. You can learn more from

Kyryl Truskovskyi 293 Dec 30, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

Ruiqi Gao 39 Nov 10, 2022
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

Minho Ryu 29 Nov 30, 2022
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Peter Sharpe 394 Dec 23, 2022
Train DeepLab for Semantic Image Segmentation

Train DeepLab for Semantic Image Segmentation Martin Kersner, [email protected]

Martin Kersner 172 Dec 14, 2022
Differentiable Surface Triangulation

Differentiable Surface Triangulation This is our implementation of the paper Differentiable Surface Triangulation that enables optimization for any pe

61 Dec 07, 2022
Joint-task Self-supervised Learning for Temporal Correspondence (NeurIPS 2019)

Joint-task Self-supervised Learning for Temporal Correspondence Project | Paper Overview Joint-task Self-supervised Learning for Temporal Corresponden

Sifei Liu 167 Dec 14, 2022
Cobalt Strike teamserver detection.

Cobalt-Strike-det Cobalt Strike teamserver detection. usage: cobaltstrike_verify.py [-l TARGETS] [-t THREADS] optional arguments: -h, --help show this

TimWhite 17 Sep 27, 2022
EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness Improving GAN Equilibrium by Raising Spatial Awareness Jianyuan Wang, Ceyuan Yang, Ying

GenForce: May Generative Force Be with You 149 Dec 19, 2022
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 05, 2023