MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Last update: Jan 04, 2023

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.

⚙️ Installation

Click to expand

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy

🎬 Demo

Click to expand

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.059	0.016	0.026	0.157	0.084	0.964	0.995	0.999	0.108	0.079	0.856	0.974	0.996

SUN3D/RGBD/Scenes11

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.082	0.165	0.047	0.440	0.147	0.921	0.939	0.948	0.325	0.284	0.753	0.894	0.933

dataset scenes11: 256

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.046	0.080	0.018	0.439	0.107	0.976	0.989	0.993	0.155	0.058	0.822	0.945	0.979

dataset sun3d: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.099	0.055	0.044	0.304	0.137	0.893	0.970	0.993	0.224	0.171	0.649	0.890	0.969

-> Done!

depth

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.071	0.096	0.033	0.402	0.127	0.938	0.970	0.981	0.222	0.152	0.755	0.915	0.963

DTU

Click to expand

Download

data 🔗 eval data 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

✏️ Changelog

Nov 27 2021

⚙️ Installation

🎬 Demo

⏳ Training & Testing

ScanNet

Download

Training

Testing

SUN3D/RGBD/Scenes11

Download

Training

Testing

DTU

Download

Training

Testing

Acknowledgement

Owner

Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Facial expression detector

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Quantized tflite models for ailia TFLite Runtime

This is the repo for Uncertainty Quantification 360 Toolkit.

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Neural Koopman Lyapunov Control

Aydin is a user-friendly, feature-rich, and fast image denoising tool

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

This is a Deep Leaning API for classifying emotions from human face and human audios.

source code of “Visual Saliency Transformer” (ICCV2021)

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

A web application that provides real time temperature and humidity readings of a house.

Container : Context Aggregation Network

An executor that performs image segmentation on fashion items