Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Related tags

Deep LearningVMNet
Overview

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Framework Fig

Created by Zeyu HU

Introduction

This work is based on our paper VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation, which appears at the IEEE International Conference on Computer Vision (ICCV) 2021.

In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters).

Citation

If you find our work useful in your research, please consider citing:

@misc{hu2021vmnet,
      title={VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation}, 
      author={Zeyu Hu and Xuyang Bai and Jiaxiang Shang and Runze Zhang and Jiayu Dong and Xin Wang and Guangyuan Sun and Hongbo Fu and Chiew-Lan Tai},
      year={2021},
      eprint={2107.13824},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Installation

  • Our code is based on Pytorch. Please make sure CUDA and cuDNN are installed. One configuration has been tested:

    • Python 3.7
    • Pytorch 1.4.0
    • torchvision 0.5.0
    • CUDA 10.0
    • cudatoolkit 10.0.130
    • cuDNN 7.6.5
  • VMNet depends on the torch-geometric and torchsparse libraries. Please follow their installation instructions. One configuration has been tested, higher versions should work as well:

    • torch-geometric 1.6.3
    • torchsparse 1.1.0
  • We adapted VCGlib to generate pooling trace maps for vertex clustering and quadric error metrics.

    git clone https://github.com/cnr-isti-vclab/vcglib
    
    # QUADRIC ERROR METRICS
    cd vcglib/apps/tridecimator/
    qmake
    make
    
    # VERTEX CLUSTERING
    cd ../sample/trimesh_clustering
    qmake
    make
    

    Please add vcglib/apps/tridecimator and vcglib/apps/sample/trimesh_clustering to your environment path variable.

  • Other dependencies. One configuration has been tested:

    • open3d 0.9.0
    • plyfile 0.7.3
    • scikit-learn 0.24.0
    • scipy 1.6.0

Data Preparation

  • Please refer to https://github.com/ScanNet/ScanNet and https://github.com/niessner/Matterport to get access to the ScanNet and Matterport dataset. Our method relies on the .ply as well as the .labels.ply files. We take ScanNet dataset as example for the following instructions.

  • Create directories to store processed data.

    • 'path/to/processed_data/train/'
    • 'path/to/processed_data/val/'
    • 'path/to/processed_data/test/'
  • Prepare train data.

    python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_train.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/train/
    
  • Prepare val data.

    python prepare_data.py --considered_rooms_path dataset/data_split/scannetv2_val.txt --in_path path/to/ScanNet/scans --out_path path/to/processed_data/val/
    
  • Prepare test data.

    python prepare_data.py --test_split --considered_rooms_path dataset/data_split/scannetv2_test.txt --in_path path/to/ScanNet/scans_test --out_path path/to/processed_data/test/
    

Train

  • On train/val/test setting.

    CUDA_VISIBLE_DEVICES=0 python run.py --train --exp_name name_you_want --data_path path/to/processed_data
    
  • On train+val/test setting (for ScanNet benchmark).

    CUDA_VISIBLE_DEVICES=0 python run.py --train_benchmark --exp_name name_you_want --data_path path/to/processed_data
    

Inference

  • Validation. Pretrained model (73.3% mIoU on ScanNet Val). Please download and put into directory check_points/val_split.

    CUDA_VISIBLE_DEVICES=0 python run.py --val --exp_name val_split --data_path path/to/processed_data
    
  • Test. Pretrained model (74.6% mIoU on ScanNet Test). Please download and put into directory check_points/test_split. TxT files for benchmark submission will be saved in directory test_results/.

    CUDA_VISIBLE_DEVICES=0 python run.py --test --exp_name test_split --data_path path/to/processed_data
    

Acknowledgements

Our code is built upon torch-geometric, torchsparse and dcm-net.

License

Our code is released under MIT License (see LICENSE file for details).

Owner
HU Zeyu
HU Zeyu
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

Princeton Natural Language Processing 119 Sep 28, 2022
Predicting future trajectories of people in cameras of novel scenarios and views.

Pedestrian Trajectory Prediction Predicting future trajectories of pedestrians in cameras of novel scenarios and views. This repository contains the c

8 Sep 03, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

Nurendra Choudhary 8 Nov 15, 2022
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 07, 2022
[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

EMLight: Lighting Estimation via Spherical Distribution Approximation (AAAI 2021) Update 12/2021: We release our Virtual Object Relighting (VOR) Datas

Fangneng Zhan 144 Jan 06, 2023
Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21 For more information, check out the paper on [arXiv]. Training with different

Sunghwan Hong 120 Jan 04, 2023
Sum-Product Probabilistic Language

Sum-Product Probabilistic Language SPPL is a probabilistic programming language that delivers exact solutions to a broad range of probabilistic infere

MIT Probabilistic Computing Project 57 Nov 17, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks This is the code for the paper: MentorNet: Learning Data-Driven Curriculum fo

Google 302 Dec 23, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos

3D-CariGAN: An End-to-End Solution to 3D Caricature Generation from Normal Face Photos This repository contains the source code and dataset for the pa

54 Oct 09, 2022
Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan 27 Nov 07, 2022
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
Robust and Accurate Object Detection via Self-Knowledge Distillation

Robust and Accurate Object Detection via Self-Knowledge Distillation paper:https://arxiv.org/abs/2111.07239 Environments Python 3.7 Cuda 10.1 Prepare

Weipeng Xu 6 Jul 01, 2022
Deep Convolutional Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala All images in t

Alec Radford 3.4k Dec 29, 2022