CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Related tags

Deep Learningcorenet
Overview

CoReNet

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objects live in a single consistent 3D coordinate frame relative to the camera, and they do not intersect in 3D. You can find more information in the following paper: CoReNet: Coherent 3D scene reconstruction from a single RGB image.

This repository contains source code, dataset pointers, and instructions for reproducing the results in the paper. If you find our code, data, or the paper useful, please consider citing

@InProceedings{popov20eccv,
  title="CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image",
  author="Popov, Stefan and Bauszat, Pablo and Ferrari, Vittorio", 
  booktitle="Computer Vision -- ECCV 2020",
  year="2020",
  doi="10.1007/978-3-030-58536-5_22"
}

Table of Contents

Installation

The code in this repository has been verified to work on Ubuntu 18.04 with the following dependencies:

# General APT packages
sudo apt install \
  python3-pip python3-virtualenv python python3.8-dev g++-8 \
  ninja-build git libboost-container-dev unzip

# NVIDIA related packages
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /"
sudo apt install \
    nvidia-driver-455 nvidia-utils-455 `#driver, CUDA+GL libraries, utils` \
    cuda-runtime-10-1 cuda-toolkit-10-2 libcudnn7 `# Cuda and CUDNN`

To install CoReNet, you need to clone the code from GitHub and create a python virtual environment.

# Clone CoReNet
mkdir -p ~/prj/corenet
cd ~/prj/corenet
git clone https://github.com/google-research/corenet.git .

# Setup a python virtual environment
python3.8 -m virtualenv --python=/usr/bin/python3.8 venv_38
. venv_38/bin/activate
pip install -r requirements.txt

All instructions below assume that CoReNet lives in ~/prj/corenet, that this is the current working directory, and that the virtual environment is activated. You can also run CoReNet using the supplied docker file: ~/prj/corenet/Dockerfile.

Datasets

The CoReNet paper introduced several datasets with synthetic scenes. To reproduce the experiments in the paper you need to download them, using:

cd ~/prj/corenet
mkdir -p ~/prj/corenet/data/raw
for n in single pairs triplets; do  
  for s in train val test; do
    wget "https://storage.googleapis.com/gresearch/corenet/${n}.${s}.tar" \
      -O "data/raw/${n}.${s}.tar" 
    tar -xvf "data/raw/${n}.${s}.tar" -C data/ 
  done 
done

For each scene, these datasets provide the objects placement, a good view point, and two images rendered from it with a varying degree of realism. To download the actual object geometry, you need to download ShapeNetCore.v2.zip from ShapeNet's original site, unpack it, and convert the 3D meshes to CoReNet's binary format:

echo "Please download ShapeNetCore.v2.zip from ShapeNet's original site and "
echo "place it in ~/prj/corenet/data/raw/ before running the commands below"

cd ~/prj/corenet
unzip data/raw/ShapeNetCore.v2.zip -d data/raw/
PYTHONPATH=src python -m preprocess_shapenet \
  --shapenet_root=data/raw/ShapeNetCore.v2 \
  --output_root=data/shapenet_meshes

Models from the paper

To help reproduce the results from the CoReNet paper, we offer 5 pre-trained models from it (h5, h7, m7, m9, and y1; details below and in the paper). You can download and unpack these using:

cd ~/prj/corenet
wget https://storage.googleapis.com/gresearch/corenet/paper_tf_models.tgz \
  -O data/raw/paper_tf_models.tgz
tar xzvf data/raw/paper_tf_models.tgz -C data/

You can evaluate the downloaded models against their respective test sets using:

MODEL=h7  # Set to one of: h5, h7, m7, m9, y1

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
tf_model_eval --config_path=configs/paper_tf_models/${MODEL}.json5

To run on multiple GPUs in parallel, set --nproc_per_node to the number of desired GPUs. You can use CUDA_VISIBLE_DEVICES to control which GPUs exactly to use. CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control the just-in-time compiler for the voxelization operation.

Upon completion, quantitative results will be stored in ~/prj/corenet/output/paper_tf_models/${MODEL}/voxel_metrics.csv. Qualitative results will be available in ~/prj/corenet/output/paper_tf_models/${MODEL}/ in the form of PNG files.

This table summarizes the model attributes and their performance. More details can be found in the paper.

model dataset realism native resolution mean IoU
h5 single low 128 x 128 x 128 57.9%
h7 single high 128 x 128 x 128 59.1%
y1 single low 32 x 32 x 32 53.3%
m7 pairs high 128 x 128 x 128 43.1%
m9 triplets high 128 x 128 x 128 43.9%

Note that all models are evaluated on a grid resolution of 128 x 128 x 128, independent of their native resolution (see section 3.5 in the paper). The performance computed with this code matches the one reported in the paper for h5, h7, m7, and m9. For y1, the performance here is slightly higher (+0.2% IoU), as we no longer have the exact checkpoint used in the paper.

You can also run these models on individual images interactively, using the corenet_demo.ipynb notebook. For this, you need to also pip install jupyter-notebook in your virtual environment.

Training and evaluating a new model

We offer PyTorch code for training and evaluating models. To train a model, you need to (once) import the starting ResNet50 checkpoint:

cd ~/prj/corenet
PYTHONPATH=src python -m import_resnet50_checkpoint

Then run:

MODEL=h7  # Set to one of: h5, h7, m7, m9 

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
train --config_path=configs/models/h7.json5

Again, use --nproc_per_node and CUDA_VISIBLE_DEVICES to control parallel execution on multiple GPUs, CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control just-in-time compilation.

You can also evaluate individual checkpoints, for example:

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 eval \
  --cpt_path=output/models/h7/cpt/persistent/state_000000000.cpt \
  --output_path=output/eval_cpt_example \
  --eval_names_regex="short.*" \
  -jq '(.. | .config? | select(.num_qualitative_results != null) | .num_qualitative_results) |= 4' \

The -jq option limits the number of qualitative results to 4 (see also Further details section)

We currently offer checkpoints trained with this code for models h5, h7, m7, and m9, in this .tgz. These checkpoints achieve slightly better performance than the paper (see table below). This is likely due to a different distributed training strategy (synchronous here vs. asynchronous in the paper) and a different ML framework (PyTorch vs. TensorFlow in the paper).

h5 h7 m7 m9
mean IoU 60.2% 61.6% 45.0% 46.9%

Further details

Configuration files

The evaluation and training scripts are configured using JSON5 files that map to the TfModelEvalPipeline and TrainPipeline dataclasses in src/corenet/configuration.py. You can find description of the different configuration options in code comments, starting from these two classes.

You can also modify the configuration on the fly, through jq queries, as well as defines that change entries in the string_templates section. For example, the following options change the number of workers, and the prefetch factor of the data loaders, as well as the location of the data and the output directories:

... \
-jq "'(.. | .data_loader? | select(. != null) | .num_data_workers) |= 12'" \
    "'(.. | .data_loader? | select(. != null) | .prefetch_factor) |= 4'" \
-D 'data_dir=gs://some_gcs_bucket/data' \
   'output_dir=gs://some_gcs_bucket/output/models'

Dataset statistics

The table below summarizes the number of scenes in each dataset

single pairs triplets
train 883084 319981 80000
val 127286 45600 11400
test 246498 91194 22798

Licenses

The code and the checkpoints are released under the Apache 2.0 License. The datasets, the documentation, and the configuration files are licensed under the Creative Commons Attribution 4.0 International License.

Owner
Google Research
Google Research
DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

Amir Bar 283 Dec 27, 2022
A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022
A parametric soroban written with CADQuery.

A parametric soroban written in CADQuery The purpose of this project is to demonstrate how "code CAD" can be intuitive to learn. See soroban.py for a

Lee 4 Aug 13, 2022
Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

Google 157 Dec 26, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

NeoDTI NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions (Bioinformatics).

62 Nov 26, 2022
Pytorch implementation of set transformer

set_transformer Official PyTorch implementation of the paper Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks .

Juho Lee 410 Jan 06, 2023
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 01, 2022
Torchreid: Deep learning person re-identification in PyTorch.

Torchreid Torchreid is a library for deep-learning person re-identification, written in PyTorch. It features: multi-GPU training support both image- a

Kaiyang 3.7k Jan 05, 2023
*ObjDetApp* deploys a pytorch model for object detection

*ObjDetApp* deploys a pytorch model for object detection

Will Chao 1 Dec 26, 2021
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
A comprehensive and up-to-date developer education platform for Urbit.

curriculum A comprehensive and up-to-date developer education platform for Urbit. This project organizes developer capabilities into a hierarchy of co

Sigilante 36 Oct 04, 2022
Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations The code of: Weakly Supervised Learning of Instance Segmentation with I

Jiwoon Ahn 472 Dec 29, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
A python library for self-supervised learning on images.

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more effi

Lightly 2k Jan 08, 2023
Library to enable Bayesian active learning in your research or labeling work.

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

ElementAI 687 Dec 25, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Official repository for: Continuous Control With Ensemble DeepDeterministic Policy Gradients

Continuous Control With Ensemble Deep Deterministic Policy Gradients This repository is the official implementation of Continuous Control With Ensembl

4 Dec 06, 2021
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

Atharva Phatak 85 Dec 26, 2022