Code for the upcoming CVPR 2021 paper

Overview

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael FirmanCVPR 2021

[Link to paper]

We introduce ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.

  • Self-supervised: We train from monocular video only. No depths or poses are needed at training or test time.
  • Good depths from single frames; even better depths from short sequences.
  • Efficient: Only one forward pass at test time. No test-time optimization needed.
  • State-of-the-art self-supervised monocular-trained depth estimation on KITTI and CityScapes.

Overview

Cost volumes are commonly used for estimating depths from multiple input views:

Cost volume used for aggreagting sequences of frames

However, cost volumes do not easily work with self-supervised training.

Baseline: Depth from cost volume input without our contributions

In our paper, we:

  • Introduce an adaptive cost volume to deal with unknown scene scales
  • Fix problems with moving objects
  • Introduce augmentations to deal with static cameras and start-of-sequence frames

These contributions enable cost volumes to work with self-supervised training:

ManyDepth: Depth from cost volume input with our contributions

With our contributions, short test-time sequences give better predictions than methods which predict depth from just a single frame.

ManyDepth vs Monodepth2 depths and error maps

✏️ 📄 Citation

If you find our work useful or interesting, please cite our paper:

@inproceedings{watson2021temporal,
    author = {Jamie Watson and
              Oisin Mac Aodha and
              Victor Prisacariu and
              Gabriel Brostow and
              Michael Firman},
    title = {{The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth}},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

📈 Results

Our ManyDepth method outperforms all previous methods in all subsections across most metrics, whether or not the baselines use multiple frames at test time. See our paper for full details.

KITTI results table

👀 Reproducing Paper Results

To recreate the results from our paper, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_KITTI_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name>

Depending on the size of your GPU, you may need to set --batch_size to be lower than 12. Additionally you can train a high resolution model by adding --height 320 --width 1024.

For instructions on downloading the KITTI dataset, see Monodepth2

To train a CityScapes model, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_preprocessed_cityscapes_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name> \
    --dataset cityscapes_preprocessed \
    --split cityscapes_preprocessed \
    --freeze_teacher_epoch 5 \
    --height 192 --width 512

This assumes you have already preprocessed the CityScapes dataset using SfMLearner's prepare_train_data.py script. We used the following command:

python prepare_train_data.py \
    --img_height 512 \
    --img_width 1024 \
    --dataset_dir <path_to_downloaded_cityscapes_data> \
    --dataset_name cityscapes \
    --dump_root <your_preprocessed_cityscapes_path> \
    --seq_length 3 \
    --num_threads 8

Note that while we use the --img_height 512 flag, the prepare_train_data.py script will save images which are 1024x384 as it also crops off the bottom portion of the image. You could probably save disk space without a loss of accuracy by preprocessing with --img_height 256 --img_width 512 (to create 512x192 images), but this isn't what we did for our experiments.

💾 Pretrained weights and evaluation

You can download weights for some pretrained models here:

To evaluate a model on KITTI, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_KITTI_path> \
    --load_weights_folder <your_model_path>
    --eval_mono

Make sure you have first run export_gt_depth.py to extract ground truth files.

And to evaluate a model on Cityscapes, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_cityscapes_path> \
    --load_weights_folder <your_model_path>
    --eval_mono \
    --eval_split cityscapes

During evaluation, we crop and evaluate on the middle 50% of the images.

We provide ground truth depth files HERE, which were converted from pixel disparities using intrinsics and the known baseline. Download this and unzip into splits/cityscapes.

🖼 Running on your own images

We provide some sample code in test_simple.py which demonstrates multi-frame inference. This predicts depth for a sequence of two images cropped from a dashcam video. Prediction also requires an estimate of the intrinsics matrix, in json format. For the provided test images, we have estimated the intrinsics to be equivalent to those of the KITTI dataset. Note that the intrinsics provided in the json file are expected to be in normalised coordinates.

Download and unzip model weights from one of the links above, and then run the following command:

python -m manydepth.test_simple \
    --target_image_path assets/test_sequence_target.jpg \
    --source_image_path assets/test_sequence_source.jpg \
    --intrinsics_json_path assets/test_sequence_intrinsics.json \
    --model_path path/to/weights

A predicted depth map rendering will be saved to assets/test_sequence_target_disp.jpeg.

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
An open-source, low-cost, image-based weed detection device for fallow scenarios.

Welcome to the OpenWeedLocator (OWL) project, an opensource hardware and software green-on-brown weed detector that uses entirely off-the-shelf compon

Guy Coleman 145 Jan 05, 2023
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

ArXiv | Get Start Neural-Texture-Extraction-Distribution The PyTorch implementation for our paper "Neural Texture Extraction and Distribution for Cont

Ren Yurui 111 Dec 10, 2022
Official implementation of the paper Do pedestrians pay attention? Eye contact detection for autonomous driving

Do pedestrians pay attention? Eye contact detection for autonomous driving Official implementation of the paper Do pedestrians pay attention? Eye cont

VITA lab at EPFL 26 Nov 02, 2022
This is a code repository for paper OODformer: Out-Of-Distribution Detection Transformer

OODformer: Out-Of-Distribution Detection Transformer This repo is the official the implementation of the OODformer: Out-Of-Distribution Detection Tran

34 Dec 02, 2022
Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

Linda MacPhee-Cobb 386 Jan 03, 2023
This project aims to segment 4 common retinal lesions from Fundus Images.

This project aims to segment 4 common retinal lesions from Fundus Images.

Husam Nujaim 1 Oct 10, 2021
This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking

SimpleTrack This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking. We are still working on writing t

TuSimple 189 Dec 26, 2022
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)

DFC2022 Baseline A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022) This repository uses TorchGeo, PyTorch Lightning, and Segmenta

isaac 24 Nov 28, 2022
Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

SS-GMM Implementation of ICCV 2021 oral paper -- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image with supplementary material R

HUST-The Tan Lab 4 Dec 05, 2022
The Deep Learning with Julia book, using Flux.jl.

Deep Learning with Julia DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and specifically the

Logan Kilpatrick 67 Dec 25, 2022
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
Standalone pre-training recipe with JAX+Flax

Sabertooth Sabertooth is standalone pre-training recipe based on JAX+Flax, with data pipelines implemented in Rust. It runs on CPU, GPU, and/or TPU, b

Nikita Kitaev 26 Nov 28, 2022
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Paper Introduction Multi-task indoor scene understanding is widely considered a

62 Dec 05, 2022
Denoising Normalizing Flow

Denoising Normalizing Flow Christian Horvat and Jean-Pascal Pfister 2021 We combine Normalizing Flows (NFs) and Denoising Auto Encoder (DAE) by introd

CHrvt 17 Oct 15, 2022
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Paddle-PANet 目录 结果对比 论文介绍 快速安装 结果对比 CTW1500 Method Backbone Fine

7 Aug 08, 2022
[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

MVSNeRF Project page | Paper This repository contains a pytorch lightning implementation for the ICCV 2021 paper: MVSNeRF: Fast Generalizable Radiance

Anpei Chen 529 Dec 30, 2022
3D Pose Estimation for Vehicles

3D Pose Estimation for Vehicles Introduction This work generates 4 key-points and 2 key-edges from vertices and edges of vehicles as ground truth. The

Jingyi Wang 1 Nov 01, 2021