[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Overview

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

Paper

Accepted to CVPR 2021

图片

Abstract

HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to the deployed expensive sensors and time-consuming computation. Camera-based methods usually need to separately perform road segmentation and view transformation, which often causes distortion and the absence of content.  To push the limits of the technology, we present a novel framework that enables reconstructing a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only.  In particular, we propose a cross-view transformation module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. Considering the relationship between vehicles and roads, we also design a context-aware discriminator to further refine the results. Experiments on public benchmarks show that our method achieves the state-of-the-art performance in the tasks of road layout estimation and vehicle occupancy estimation. Especially for the latter task, our model outperforms all competitors by a large margin. Furthermore, our model runs at 35 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.

Contributions

  • We propose a novel framework that reconstructs a local map formed by top-view road scene layout and vehicle occupancy using a single monocular front-view image only. In particular, we propose a cross-view transformation module which leverages the cycle consistency between views and their correlation to strengthen the view transformation.
  • We also propose a context-aware discriminator that considers the spatial relationship between vehicles and roads in the task of estimating vehicle occupancies.
  • On public benchmarks, it is demonstrated that our model achieves the state-of-the-art performance for the tasks of road layout and vehicle occupancy estimation.

Approach overview

图片

Repository Structure

cross-view/
├── crossView            # Contains scripts for dataloaders and network/model architecture
└── datasets             # Contains datasets
    ├── argoverse        # argoverse dataset
    ├── kitti            # kitti dataset 
├── log                  # Contains a log of network/model
├── losses               # Contains scripts for loss of network/model
├── models               # Contains the saved model of the network/model
├── output               # Contains output of network/model
└── splits
    ├── 3Dobject         # Training and testing splits for KITTI 3DObject Detection dataset 
    ├── argo             # Training and testing splits for Argoverse Tracking v1.0 dataset
    ├── odometry         # Training and testing splits for KITTI Odometry dataset
    └── raw              # Training and testing splits for KITTI RAW dataset(based on Schulter et. al.)

Installation

We recommend setting up a Python 3.7 and Pytorch 1.0 Virtual Environment and installing all the dependencies listed in the requirements file.

git clone https://github.com/JonDoe-297/cross-view.git

cd cross-view
pip install -r requirements.txt

Datasets

In the paper, we've presented results for KITTI 3D Object, KITTI Odometry, KITTI RAW, and Argoverse 3D Tracking v1.0 datasets. For comparison with Schulter et. al., We've used the same training and test splits sequences from the KITTI RAW dataset. For more details about the training/testing splits one can look at the splits directory. And you can download Ground-truth from Monolayout.

# Download KITTI RAW
./data/download_datasets.sh raw

# Download KITTI 3D Object
./data/download_datasets.sh object

# Download KITTI Odometry
./data/download_datasets.sh odometry

# Download Argoverse Tracking v1.0
./data/download_datasets.sh argoverse

The above scripts will download, unzip and store the respective datasets in the datasets directory.

datasets/
└── argoverse                          # argoverse dataset
    └── argoverse-tracking
        └── train1
            └── 1d676737-4110-3f7e-bec0-0c90f74c248f
                ├── car_bev_gt         # Vehicle GT
                ├── road_gt            # Road GT
                ├── stereo_front_left  # RGB image
└── kitti                              # kitti dataset 
    └── object                         # kitti 3D Object dataset 
        └── training
            ├── image_2                # RGB image
            ├── vehicle_256            # Vehicle GT
    ├── odometry                       # kitti odometry dataset 
        └── 00
            ├── image_2                # RGB image
            ├── road_dense128  # Road GT
    ├── raw                            # kitti raw dataset 
        └── 2011_09_26
            └── 2011_09_26_drive_0001_sync
                ├── image_2            # RGB image
                ├── road_dense128      # Road GT

Training

  1. Prepare the corresponding dataset
  2. Run training
# Corss view Road (KITTI Odometry)
python3 train.py --type static --split odometry --data_path ./datasets/odometry/ --model_name <Model Name with specifications>

# Corss view Vehicle (KITTI 3D Object)
python3 train.py --type dynamic --split 3Dobject --data_path ./datasets/kitti/object/training --model_name <Model Name with specifications>

# Corss view Road (KITTI RAW)
python3 train.py --type static --split raw --data_path ./datasets/kitti/raw/  --model_name <Model Name with specifications>

# Corss view Vehicle (Argoverse Tracking v1.0)
python3 train.py --type dynamic --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>

# Corss view Road (Argoverse Tracking v1.0)
python3 train.py --type static --split argo --data_path ./datasets/argoverse/ --model_name <Model Name with specifications>
  1. The training model are in "models" (default: ./models)

Testing

  1. Download pre-trained models
  2. Run testing
python3 test.py --type <static/dynamic> --model_path <path to the model directory> --image_path <path to the image directory>  --out_dir <path to the output directory> 
  1. The results are in "output" (default: ./output)

Evaluation

  1. Prepare the corresponding dataset
  2. Download pre-trained models
  3. Run evaluation
# Evaluate on KITTI Odometry 
python3 eval.py --type static --split odometry --model_path <path to the model directory> --data_path ./datasets/odometry --height 512 --width 512 --occ_map_size 128

# Evaluate on KITTI 3D Object
python3 eval.py --type dynamic --split 3Dobject --model_path <path to the model directory> --data_path ./datasets/kitti/object/training

# Evaluate on KITTI RAW
python3 eval.py --type static --split raw --model_path <path to the model directory> --data_path ./datasets/kitti/raw/

# Evaluate on Argoverse Tracking v1.0 (Road)
python3 eval.py --type static --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse/

# Evaluate on Argoverse Tracking v1.0 (Vehicle)
python3 eval.py --type dynamic --split argo --model_path <path to the model directory> --data_path ./datasets/kitti/argoverse
  1. The results are in "output" (default: ./output)

Pretrained Models

The following table provides links to the pre-trained models for each dataset mentioned in our paper. The table also shows the corresponding evaluation results for these models.

Dataset Segmentation Objects mIOU(%) mAP(%) Pretrained Model
KITTI 3D Object Vehicle 38.85 51.04 link
KITTI Odometry Road 77.47 86.39 link
KITTI Raw Road 68.26 79.65 link
Argoverse Tracking Vehicle 47.87 62.69 link
Argoverse Tracking Road 76.56 87.30 link

Results

图片

Contact

If you meet any problems, please describe them in issues or contact:

License

This project is released under the MIT License (refer to the LICENSE file for details).This project partially depends on the sources of Monolayout

Coded illumination for improved lensless imaging

CodedCam Coded Illumination for Improved Lensless Imaging Paper | Supplementary results | Data and Code are available. Coded illumination for improved

Computational Sensing and Information Processing Lab 1 Nov 29, 2021
Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

Xinchen Yan 36 Oct 02, 2022
Generate vibrant and detailed images using only text.

CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See

Clay M. 401 Dec 28, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
In this project I played with mlflow, streamlit and fastapi to create a training and prediction app on digits

Fastapi + MLflow + streamlit Setup env. I hope I covered all. pip install -r requirements.txt Start app Go in the root dir and run these Streamlit str

76 Nov 23, 2022
Unofficial PyTorch implementation of TokenLearner by Google AI

tokenlearner-pytorch Unofficial PyTorch implementation of TokenLearner by Ryoo et al. from Google AI (abs, pdf) Installation You can install TokenLear

Rishabh Anand 46 Dec 20, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022
NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

NICE-GAN-pytorch - Official PyTorch implementation of NICE-GAN: Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen 208 Nov 25, 2022
This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Elaborative Rehearsal for Zero-shot Action Recognition This is an official implementation of: Shizhe Chen and Dong Huang, Elaborative Rehearsal for Ze

DeLightCMU 26 Sep 24, 2022
The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Deep Exemplar-based Video Colorization (Pytorch Implementation) Paper | Pretrained Model | Youtube video 🔥 | Colab demo Deep Exemplar-based Video Col

Bo Zhang 253 Dec 27, 2022
Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Meta-Solver for Neural Ordinary Differential Equations Towards robust neural ODEs using parametrized solvers. Main idea Each Runge-Kutta (RK) solver w

Julia Gusak 25 Aug 12, 2021
DiAne is a smart fuzzer for IoT devices

Diane Diane is a fuzzer for IoT devices. Diane works by identifying fuzzing triggers in the IoT companion apps to produce valid yet under-constrained

seclab 28 Jan 04, 2023
This is an open source python repository for various python tests

Welcome to Py-tests This is an open source python repository for various python tests. This is in response to the hacktoberfest2021 challenge. It is a

Yada Martins Tisan 3 Oct 31, 2021
This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints

CLGo This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints An earlier

刘芮金 32 Dec 20, 2022
HomeAssitant custom integration for dyson

HomeAssistant Custom Integration for Dyson This custom integration is still under development. This is a HA custom integration for dyson. There are se

Xiaonan Shen 232 Dec 31, 2022
The repository is for safe reinforcement learning baselines.

Safe-Reinforcement-Learning-Baseline The repository is for Safe Reinforcement Learning (RL) research, in which we investigate various safe RL baseline

172 Dec 19, 2022
PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Spatially Consistent Representation Learning (CVPR'21) Official PyTorch implementation of Spatially Consistent Representation Learning (SCRL). This re

Kakao Brain 102 Nov 03, 2022
Cowsay - A rewrite of cowsay in python

Python Cowsay A rewrite of cowsay in python. Allows for parsing of existing .cow

James Ansley 3 Jun 27, 2022
Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

GSAN Introduction Code for paper GSAN: Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving, wh

YE Luyao 6 Oct 27, 2022