Official implementation of NeurIPS'2021 paper TransformerFusion

Overview

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

Project Page | Paper | Video


TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
Aljaz Bozic, Pablo Palafox, Justus Thies, Angela Dai, Matthias Niessner
NeurIPS 2021

demo


TODOs

  • Evaluation code and metrics (with ground truth data)
  • Model code (with pretrained checkpoint)
  • Test-time reconstruction code
  • Training (and evaluation) data preparation scripts

How to install the framework

  • Clone the repository with submodules:
git clone --recurse-submodules https://github.com/AljazBozic/TransformerFusion.git
  • Create Conda environment:
conda env create -f environment.yml
  • Compile local C++/CUDA dependencies:
conda activate tf
cd csrc
python setup.py install

Evaluate the reconstructions

We evaluate method performance on the test scenes of ScanNet dataset.

We compare scene reconstructions to the ground truth meshes, obtained with fusion of RGB-D data. Since the ground truth meshes are not complete, we additionally compute occlusion masks of RGB-D scans, to not penalize the reconstructions that are more complete than the ground truth meshes.

You can download both ground truth meshes and occlusion masks here. To evaluate the reconstructions, you need to place them into data/reconstructions, and extract the ground truth data to data/groundtruth. The reconstructions are expected to be named as ScanNet test scenes, e.g. scene0733_00.ply. The following script computes evaluation metrics over all provided scene meshes:

conda activate tf
python src/evaluation/eval.py

Citation

If you find our work useful in your research, please consider citing:

@article{
bozic2021transformerfusion,
title={TransformerFusion: Monocular RGB Scene Reconstruction using Transformers},
author={Bozic, Aljaz and Palafox, Pablo and Thies, Justus and Dai, Angela and Niessner, Matthias},
journal={Proc. Neural Information Processing Systems (NeurIPS)},
year={2021}}       

Related work

Some other related work on monocular RGB reconstruction of indoor scenes:

License

The code from this repository is released under the MIT license.

Owner
Aljaz Bozic
PhD Student at Visual Computing Group
Aljaz Bozic
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 03, 2023
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 07, 2022
Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

Google Research 1.6k Dec 27, 2022
https://arxiv.org/abs/2102.11005

LogME LogME: Practical Assessment of Pre-trained Models for Transfer Learning How to use Just feed the features f and labels y to the function, and yo

THUML: Machine Learning Group @ THSS 149 Dec 19, 2022
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
PolyTrack: Tracking with Bounding Polygons

PolyTrack: Tracking with Bounding Polygons Abstract In this paper, we present a novel method called PolyTrack for fast multi-object tracking and segme

Gaspar Faure 13 Sep 15, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
Use Python, OpenCV, and MediaPipe to control a keyboard with facial gestures

CheekyKeys A Face-Computer Interface CheekyKeys lets you control your keyboard using your face. View a fuller demo and more background on the project

69 Nov 09, 2022
Deep Learning Models for Causal Inference

Extensive tutorials for learning how to build deep learning models for causal inference using selection on observables in Tensorflow 2.

Bernard J Koch 151 Dec 31, 2022
Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

Single Image Deraining Using Bilateral Recurrent Network Introduction Single image deraining has received considerable progress based on deep convolut

23 Aug 10, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Very recently, a variety of vision transformer architectures for dense predic

482 Dec 18, 2022
Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

samplics 37 Dec 15, 2022
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
A Player for Kanye West's Stem Player. Sort of an emulator.

Stem Player Player Stem Player Player Usage Download the latest release here Optional: install ffmpeg, instructions here NOTE: DOES NOT ENABLE DOWNLOA

119 Dec 28, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
Hyperparameter Optimization for TensorFlow, Keras and PyTorch

Hyperparameter Optimization for Keras Talos • Key Features • Examples • Install • Support • Docs • Issues • License • Download Talos radically changes

Autonomio 1.6k Dec 15, 2022
Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language This repository contains the code, model, and deployment config

16 Oct 23, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 06, 2023