Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Last update: Jun 27, 2022

Related tags

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma.

We address the problem of estimating depth with multi modal audio visual data. Inspired by the ability of animals, such as bats and dolphins, to infer distance of objects with echolocation, we propose an end-to-end deep learning based pipeline utilizing RGB images, binaural echoes and estimated material properties of various objects within a scene for the task of depth estimation.

[Project] [Paper]

Requirements

The code is tesed with

- Python 3.6 
- PyTorch 1.6.0
- Numpy 1.19.5

Dataset

Replica-VisualEchoes can be obatined from here. We have used the 128x128 image resolution for our experiment.

MatterportEchoes is an extension of existing matterport3D dataset. In order to obtain the raw frames please forward the access request acceptance from the authors of matterport3D dataset. We will release the procedure to obtain the frames and echoes using habitat-sim and soundspaces in near future.

Pre-trained Model

We have provided pre-trained model for both the datasets here. For each of the dataset four different parts of the model are saved individually with name rgbdepth_*, audiodepth_*, material_*, attention_*, where * represents the name of the dataset, i.e. replica or mp3d.

Training

To train the model, first download the pre-trained material net from above link.

python train.py \
--validation_on \
--dataset mp3d \
--img_path path_to_img_folder \
--metadatapath path_to_metadata \
--audio_path path_to_audio_folder \
--checkpoints_dir path_to_save_checkpoints \
--init_material_weight path_to_pre-trained_material_net

Evaluation

To evaluate the method using the pre-trained model, download the models for the corresponding dataset and the dataset.

Evalution for Replica dataset

python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset replica

Evaluation for Matterport3D dataset

python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset mp3d

License and Citation

The usage of this software is under MIT License.

@inproceedings{parida2021beyond,
  title={Beyond Image to Depth: Improving Depth Prediction using Echoes},
  author={Parida, Kranti and Srivastava, Siddharth and Sharma, Gaurav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Acknowledgement

Some portion of the code are adapted from Ruohan Gao. Thanks Ruohan!

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Related tags

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Requirements

Dataset

Pre-trained Model

Training

Evaluation

License and Citation

Acknowledgement

Owner

Kranti Kumar Parida

3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

Improving adversarial robustness by a coupling rejection strategy

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

A setup script to generate ITK Python Wheels

Python wrapper to access the amazon selling partner API

Fast and Simple Neural Vocoder, the Multiband RNNMS

Keeper for Ricochet Protocol, implemented with Apache Airflow

Code for ICML 2021 paper: How could Neural Networks understand Programs?

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

An AFL implementation with UnTracer (our coverage-guided tracer)

Styled Handwritten Text Generation with Transformers (ICCV 21)

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

Differentiable Factor Graph Optimization for Learning Smoothers @ IROS 2021

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Related tags

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Requirements

Dataset

Pre-trained Model

Training

Evaluation

License and Citation

Acknowledgement

Owner

Kranti Kumar Parida

3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

Improving adversarial robustness by a coupling rejection strategy

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

A setup script to generate ITK Python Wheels

Python wrapper to access the amazon selling partner API

Fast and Simple Neural Vocoder, the Multiband RNNMS

Keeper for Ricochet Protocol, implemented with Apache Airflow

Code for ICML 2021 paper: How could Neural Networks understand Programs?

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

An AFL implementation with UnTracer (our coverage-guided tracer)

Styled Handwritten Text Generation with Transformers (ICCV 21)

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

Differentiable Factor Graph Optimization for Learning Smoothers @ IROS 2021

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch