Single/multi view image(s) to voxel reconstruction using a recurrent neural network

Related tags

Deep Learning3D-R2N2
Overview

3D-R2N2: 3D Recurrent Reconstruction Neural Network

This repository contains the source codes for the paper Choy et al., 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction, ECCV 2016. Given one or multiple views of an object, the network generates voxelized ( a voxel is the 3D equivalent of a pixel) reconstruction of the object in 3D.

Citing this work

If you find this work useful in your research, please consider citing:

@inproceedings{choy20163d,
  title={3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction},
  author={Choy, Christopher B and Xu, Danfei and Gwak, JunYoung and Chen, Kevin and Savarese, Silvio},
  booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
  year={2016}
}

News

  • [2020-01-25] Using a dense ocupancy grid for 3D reconstruction requires a large amount of memory and computation. We present a new auto-diff library for sparse tensors that can reconstruct objects in high resolution. Please refer to the 3D sparsity pattern reconstruction page for 3D reconstruction using a sparse tensor.

Project Page

The project page is available at http://cvgl.stanford.edu/3d-r2n2/.

Overview

Overview Left: images found on Ebay, Amazon, Right: overview of 3D-R2N2

Traditionally, single view reconstruction and multi-view reconstruction are disjoint problems that have been dealt using different approaches. In this work, we first propose a unified framework for both single and multi-view reconstruction using a 3D Recurrent Reconstruction Neural Network (3D-R2N2).

3D-Convolutional LSTM 3D-Convolutional GRU Inputs (red cells + feature) for each cell (purple)
3D-LSTM 3D-GRU 3D-LSTM

We can feed in images in random order since the network is trained to be invariant to the order. The critical component that enables the network to be invariant to the order is the 3D-Convolutional LSTM which we first proposed in this work. The 3D-Convolutional LSTM selectively updates parts that are visible and keeps the parts that are self-occluded.

Networks We used two different types of networks for the experiments: a shallow network (top) and a deep residual network (bottom).

Results

Please visit the result visualization page to view 3D reconstruction results interactively.

Datasets

We used ShapeNet models to generate rendered images and voxelized models which are available below (you can follow the installation instruction below to extract it to the default directory).

Installation

The package requires python3. You can follow the direction below to install virtual environment within the repository or install anaconda for python 3.

  • Download the repository
git clone https://github.com/chrischoy/3D-R2N2.git
cd 3D-R2N2
conda create -n py3-theano python=3.6
source activate py3-theano
conda install pygpu
pip install -r requirements.txt
  • copy the theanorc file to the $HOME directory
cp .theanorc ~/.theanorc

Running demo.py

  • Install meshlab (skip if you have another mesh viewer). If you skip this step, demo code will not visualize the final prediction.
sudo apt-get install meshlab
  • Run the demo code and save the final 3D reconstruction to a mesh file named prediction.obj
python demo.py prediction.obj

The demo code takes 3 images of the same chair and generates the following reconstruction.

Image 1 Image 2 Image 3 Reconstruction
  • Deactivate your environment when you are done
deactivate

Training the network

  • Activate the virtual environment before you run the experiments.
source py3/bin/activate
  • Download datasets and place them in a folder named ShapeNet
mkdir ShapeNet/
wget http://cvgl.stanford.edu/data2/ShapeNetRendering.tgz
wget http://cvgl.stanford.edu/data2/ShapeNetVox32.tgz
tar -xzf ShapeNetRendering.tgz -C ShapeNet/
tar -xzf ShapeNetVox32.tgz -C ShapeNet/
  • Train and test the network using the training shell script
./experiments/script/res_gru_net.sh

Note: The initial compilation might take awhile if you run the theano for the first time due to various compilations. The problem will not persist for the subsequent runs.

Using cuDNN

To use cuDNN library, you have to download cuDNN from the nvidia website. Then, extract the files to any directory and append the directory to the environment variables like the following. Please replace the /path/to/cuDNN/ to the directory that you extracted cuDNN.

export LD_LIBRARY_PATH=/path/to/cuDNN/lib64:$LD_LIBRARY_PATH
export CPATH=/path/to/cuDNN/include:$CPATH
export LIBRARY_PATH=/path/to/cuDNN/lib64:$LD_LIBRARY_PATH

For more details, please refer to http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html

Follow-up Paper

Gwak et al., Weakly supervised 3D Reconstruction with Adversarial Constraint, project website

Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D supervision as an alternative for expensive 3D CAD annotation. Specifically, we use foreground masks as weak supervision through a raytrace pooling layer that enables perspective projection and backpropagation. Additionally, since the 3D reconstruction from masks is an ill posed problem, we propose to constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations. We demonstrate that learning a log-barrier solution to this constrained optimization problem resembles the GAN objective, enabling the use of existing tools for training GANs. We evaluate and analyze the manifold constrained reconstruction on various datasets for single and multi-view reconstruction of both synthetic and real images.

License

MIT License

Owner
Chris Choy
Research Scientist @NVIDIA. Previously Ph.D. from Stanford Vision and Learning Lab @StanfordVL (SVL), Stanford AI Lab, SAIL.
Chris Choy
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Magnetic Graph Convolutional Networks About The official PyTorch implementation for the paper sMGC: A Complex-Valued Graph Convolutional Network via M

3 Feb 25, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

Hugging Face 77.4k Jan 05, 2023
Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

cim-misspelling Pytorch implementation of Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence, CHIL 2022. This model (

Juyong Kim 11 Dec 19, 2022
🛰️ Awesome Satellite Imagery Datasets

Awesome Satellite Imagery Datasets List of aerial and satellite imagery datasets with annotations for computer vision and deep learning. Newest datase

Christoph Rieke 3k Jan 03, 2023
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . 🎓 Re

Shuai Shen 87 Dec 28, 2022
Improving Object Detection by Label Assignment Distillation

Improving Object Detection by Label Assignment Distillation This is the official implementation of the WACV 2022 paper Improving Object Detection by L

Cybercore Co. Ltd 51 Dec 08, 2022
Minimal implementation and experiments of "No-Transaction Band Network: A Neural Network Architecture for Efficient Deep Hedging".

No-Transaction Band Network: A Neural Network Architecture for Efficient Deep Hedging Minimal implementation and experiments of "No-Transaction Band N

19 Jan 03, 2023
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
This is the pytorch re-implementation of the IterNorm

IterNorm-pytorch Pytorch reimplementation of the IterNorm methods, which is described in the following paper: Iterative Normalization: Beyond Standard

Lei Huang 32 Dec 27, 2022
Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
Nvidia Semantic Segmentation monorepo

Paper | YouTube | Cityscapes Score Pytorch implementation of our paper Hierarchical Multi-Scale Attention for Semantic Segmentation. Please refer to t

NVIDIA Corporation 1.6k Jan 04, 2023
Implementation of GGB color space

GGB Color Space This package is implementation of GGB color space from Development of a Robust Algorithm for Detection of Nuclei and Classification of

Resha Dwika Hefni Al-Fahsi 2 Oct 06, 2021
A package related to building quasi-fibration symmetries

qf A package related to building quasi-fibration symmetries. If you'd like to learn more about how it works, see the brief explanation and References

Paolo Boldi 1 Dec 01, 2021
Implementation of H-UCRL Algorithm

Implementation of H-UCRL Algorithm This repository is an implementation of the H-UCRL algorithm introduced in Curi, S., Berkenkamp, F., & Krause, A. (

Sebastian Curi 25 May 20, 2022
A unet implementation for Image semantic segmentation

Unet-pytorch a unet implementation for Image semantic segmentation 参考网上的Unet做分割的代码,做了一个针对kaggle地盐识别的,请去以下地址获取数据集: https://www.kaggle.com/c/tgs-salt-id

Rabbit 3 Jun 29, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
TensorFlow implementation of Elastic Weight Consolidation

Elastic weight consolidation Introduction A TensorFlow implementation of elastic weight consolidation as presented in Overcoming catastrophic forgetti

James Stokes 67 Oct 11, 2022
This repository includes different versions of the prescribed-time controller as Simulink blocks and MATLAB script codes for engineering applications.

Prescribed-time Control Prescribed-time control (PTC) blocks in Simulink environment, MATLAB R2020b. For more theoretical details, refer to the papers

Amir Shakouri 1 Mar 11, 2022