DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

Overview

DirectVoxGO

DirectVoxGO (Direct Voxel Grid Optimization, see our paper) reconstructs a scene representation from a set of calibrated images capturing the scene.

  • NeRF-comparable quality for synthesizing novel views from our scene representation.
  • Super-fast convergence: Our 15 mins/scene vs. NeRF's 10~20+ hrs/scene.
  • No cross-scene pre-training required: We optimize each scene from scratch.
  • Better rendering speed: Our <1 secs vs. NeRF's 29 secs to synthesize a 800x800 images.

Below run-times (mm:ss) of our optimization progress are measured on a machine with a single RTX 2080 Ti GPU.

github_teaser.mp4

Update

  • 2021.11.23: Support CO3D dataset.
  • 2021.11.23: Initial release. Issue page is disabled for now. Feel free to contact [email protected] if you have any questions.

Installation

git clone [email protected]:sunset1995/DirectVoxGO.git
cd DirectVoxGO
pip install -r requirements.txt

Pytorch installation is machine dependent, please install the correct version for your machine. The tested version is pytorch 1.8.1 with python 3.7.4.

Dependencies (click to expand)
  • PyTorch, numpy: main computation.
  • scipy, lpips: SSIM and LPIPS evaluation.
  • tqdm: progress bar.
  • mmcv: config system.
  • opencv-python: image processing.
  • imageio, imageio-ffmpeg: images and videos I/O.

Download: datasets, trained models, and rendered test views

Directory structure for the datasets (click to expand; only list used files)
data
├── nerf_synthetic     # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│   └── [chair|drums|ficus|hotdog|lego|materials|mic|ship]
│       ├── [train|val|test]
│       │   └── r_*.png
│       └── transforms_[train|val|test].json
│
├── Synthetic_NSVF     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/Synthetic_NSVF.zip
│   └── [Bike|Lifestyle|Palace|Robot|Spaceship|Steamtrain|Toad|Wineholder]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0_train|1_val|2_test]_*.png
│       └── pose
│           └── [0_train|1_val|2_test]_*.txt
│
├── BlendedMVS         # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/BlendedMVS.zip
│   └── [Character|Fountain|Jade|Statues]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── TanksAndTemple     # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/TanksAndTemple.zip
│   └── [Barn|Caterpillar|Family|Ignatius|Truck]
│       ├── intrinsics.txt
│       ├── rgb
│       │   └── [0|1|2]_*.png
│       └── pose
│           └── [0|1|2]_*.txt
│
├── deepvoxels     # Link: https://drive.google.com/drive/folders/1ScsRlnzy9Bd_n-xw83SP-0t548v63mPH
│   └── [train|validation|test]
│       └── [armchair|cube|greek|vase]
│           ├── intrinsics.txt
│           ├── rgb/*.png
│           └── pose/*.txt
│
└── co3d               # Link: https://github.com/facebookresearch/co3d
    └── [donut|teddybear|umbrella|...]
        ├── frame_annotations.jgz
        ├── set_lists.json
        └── [129_14950_29917|189_20376_35616|...]
            ├── images
            │   └── frame*.jpg
            └── masks
                └── frame*.png

Synthetic-NeRF, Synthetic-NSVF, BlendedMVS, Tanks&Temples, DeepVoxels datasets

We use the datasets organized by NeRF, NSVF, and DeepVoxels. Download links:

Download all our trained models and rendered test views at this link to our logs.

CO3D dataset

We also support the recent Common Objects In 3D dataset. Our method only performs per-scene reconstruction and no cross-scene generalization.

GO

Train

To train lego scene and evaluate testset PSNR at the end of training, run:

$ python run.py --config configs/nerf/lego.py --render_test

Use --i_print and --i_weights to change the log interval.

Evaluation

To only evaluate the testset PSNR, SSIM, and LPIPS of the trained lego without re-training, run:

$ python run.py --config configs/nerf/lego.py --render_only --render_test \
                                              --eval_ssim --eval_lpips_vgg

Use --eval_lpips_alex to evaluate LPIPS with pre-trained Alex net instead of VGG net.

Reproduction

All config files to reproduce our results:

$ ls configs/*
configs/blendedmvs:
Character.py  Fountain.py  Jade.py  Statues.py

configs/nerf:
chair.py  drums.py  ficus.py  hotdog.py  lego.py  materials.py  mic.py  ship.py

configs/nsvf:
Bike.py  Lifestyle.py  Palace.py  Robot.py  Spaceship.py  Steamtrain.py  Toad.py  Wineholder.py

configs/tankstemple:
Barn.py  Caterpillar.py  Family.py  Ignatius.py  Truck.py

configs/deepvoxels:
armchair.py  cube.py  greek.py  vase.py

Your own config files

Check the comments in configs/default.py for the configuable settings. The default values reproduce our main setup reported in our paper. We use mmcv's config system. To create a new config, please inherit configs/default.py first and then update the fields you want. Below is an example from configs/blendedmvs/Character.py:

_base_ = '../default.py'

expname = 'dvgo_Character'
basedir = './logs/blended_mvs'

data = dict(
    datadir='./data/BlendedMVS/Character/',
    dataset_type='blendedmvs',
    inverse_y=True,
    white_bkgd=True,
)

Development and tuning guide

Extention to new dataset

Adjusting the data related config fields to fit your camera coordinate system is recommend before implementing a new one. We provide two visualization tools for debugging.

  1. Inspect the camera and the allocated BBox.
    • Export via --export_bbox_and_cams_only {filename}.npz:
      python run.py --config configs/nerf/mic.py --export_bbox_and_cams_only cam_mic.npz
    • Visualize the result:
      python tools/vis_train.py cam_mic.npz
  2. Inspect the learned geometry after coarse optimization.
    • Export via --export_coarse_only {filename}.npz (assumed coarse_last.tar available in the train log):
      python run.py --config configs/nerf/mic.py --export_coarse_only coarse_mic.npz
    • Visualize the result:
      python tools/vis_volume.py coarse_mic.npz 0.001 --cam cam_mic.npz
Inspecting the cameras & BBox Inspecting the learned coarse volume

Speed and quality tradeoff

We have reported some ablation experiments in our paper supplementary material. Setting N_iters, N_rand, num_voxels, rgbnet_depth, rgbnet_width to larger values or setting stepsize to smaller values typically leads to better quality but need more computation. Only stepsize is tunable in testing phase, while all the other fields should remain the same as training.

Acknowledgement

The code base is origined from an awesome nerf-pytorch implementation, but it becomes very different from the code base now.

Owner
sunset
A Ph.D. candidate working on computer vision tasks. Recently focusing on 3D modeling.
sunset
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Facebook Research 125 Dec 25, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

Sayan Nath 8 Oct 03, 2022
PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

Mingu Kang 17 Dec 13, 2022
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT) 35 Oct 31, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

ObjProp Introduction This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Insta

Anirudh S Chakravarthy 6 May 03, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022
A modular domain adaptation library written in PyTorch.

A modular domain adaptation library written in PyTorch.

Kevin Musgrave 225 Dec 29, 2022
Gym for multi-agent reinforcement learning

PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym. Our website, with

Farama Foundation 1.6k Jan 09, 2023
Progressive Growing of GANs for Improved Quality, Stability, and Variation

Progressive Growing of GANs for Improved Quality, Stability, and Variation — Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NV

Tero Karras 5.9k Jan 05, 2023
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features Overview This repository is the Pytorch implementation of PRIN/SPRIN: On Extracting P

Yang You 17 Mar 02, 2022
MTA:SA Server Configer.

MTAConfiger MTA:SA Server Configer. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Learning CS

3 Jun 07, 2022
Generating Images with Recurrent Adversarial Networks

Generating Images with Recurrent Adversarial Networks Python (Theano) implementation of Generating Images with Recurrent Adversarial Networks code pro

Daniel Jiwoong Im 121 Sep 08, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Yuval Rosen 3 Jul 14, 2021
A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

CoAtNet Overview This is a PyTorch implementation of CoAtNet specified in "CoAtNet: Marrying Convolution and Attention for All Data Sizes", arXiv 2021

Justin Wu 268 Jan 07, 2023