SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Related tags

Deep LearningSAT
Overview

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

by Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2021, Oral

Introduction

We propose 2D Semantics Assisted Training (SAT) that assists 3D visual grounding with 2D semantics. SAT helps 3D tasks with 2D semantics in training but does not require 2D inputs during inference. For more details, please refer to our paper.

Citation

@inproceedings{yang2021sat,
  title={SAT: 2D Semantics Assisted Training for 3D Visual Grounding},
  author={Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo},
  booktitle={ICCV},
  year={2021}
}

Prerequisites

  • Python 3.6.9 (e.g., conda create -n sat_env python=3.6.9 cudatoolkit=10.0)
  • Pytorch 1.2.0 (e.g., conda install pytorch==1.2.0 cudatoolkit=10.0 -c pytorch)
  • Install other common packages (numpy, pytorch_transformers, etc.)
  • Please refer to setup.py (From ReferIt3D).

Installation

  • Clone the repository

    git clone https://github.com/zyang-ur/SAT.git
    cd SAT
    pip install -e .
    
  • To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd external_tools/pointnet2
    python setup.py install
    

Data

ScanNet

First you should download the train/val scans of ScanNet if you do not have them locally. Please refer to the instructions from referit3d for more details. The output is the scanfile keep_all_points_00_view_with_global_scan_alignment.pkl/keep_all_points_with_global_scan_alignment.pkl.

Ref3D Linguistic Data

You can dowload the Nr3D and Sr3D/Sr3D+ from Referit3D, and send the file path to referit3D-file.

SAT Processed 2D Features

You can download the processed 2D object image features from here. The cached feature should be placed under the referit3d/data folder, or match the cache path in the dataloader. The feature extraction code will be cleaned and released in the future. Meanwhile, feel free to contact me if you need that before the official release.

Training

Please reference the following example command on Nr3D. Feel free to change the parameters. Please reference arguments for valid options.

cd referit3d/scripts
scanfile=keep_all_points_00_view_with_global_scan_alignment.pkl ## keep_all_points_with_global_scan_alignment if include Sr3D
python train_referit3d.py --patience 100 --max-train-epochs 100 --init-lr 1e-4 --batch-size 16 --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile_csv --log-dir log/$exp_id --n-workers 2 --gpu 0 --unit-sphere-norm True --feat2d clsvecROI --context_2d unaligned --mmt_mask train2d --warmup

Evaluation

Please find the pretrained models here (clsvecROI on Nr3D). A known issue here.

cd referit3d/scripts
python train_referit3d.py --transformer --model mmt_referIt3DNet -scannet-file $scanfile -referit3D-file $nr3dfile --log-dir log/$exp_id --n-workers 2 --gpu $gpu --unit-sphere-norm True --feat2d clsvecROI --mode evaluate --pretrain-path $pretrain_path/best_model.pth

Credits

The project is built based on the following repository:

Part of the code or models are from ScanRef, MMF, TAP, and ViLBERT.

Owner
Zhengyuan Yang
Zhengyuan Yang
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
Training data extraction on GPT-2

Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll

Florian Tramer 62 Dec 07, 2022
3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021)

3DDUNET This is the code for 3D2Unet: 3D Deformable Unet for Low-Light Video Enhancement (PRCV2021) Conference Paper Link Dataset We use SMOID dataset

1 Jan 07, 2022
A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal

A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases,

Chris Hughes 110 Dec 23, 2022
Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2022/01/05 By another round of training based on previous weights, our model also achieved a better performance on ACDC (91.61% DSC). W

dotman 92 Dec 25, 2022
LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and eval

Ahmet Erdem 691 Dec 23, 2022
public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

PlusLab 19 Oct 27, 2022
Linear Variational State Space Filters

Linear Variational State Space Filters To set up the environment, use the provided scripts in the docker/ folder to build and run the codebase inside

0 Dec 13, 2021
Learning to Estimate Hidden Motions with Global Motion Aggregation

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA) This repository contains the source code for our paper: Learning to Estimate

Shihao Jiang (Zac) 221 Dec 18, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

0 Jul 15, 2021
Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

447 Jan 05, 2023
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Paper Khoi Nguyen, Sinisa Todorovic "A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation", accepted to ICCV 2021 Our code is mai

Khoi Nguyen 5 Aug 14, 2022
(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

NeRF--: Neural Radiance Fields Without Known Camera Parameters Project Page | Arxiv | Colab Notebook | Data Zirui Wang¹, Shangzhe Wu², Weidi Xie², Min

Active Vision Laboratory 411 Dec 26, 2022
An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%

Heart Failure Predictor About A Web UI deployed Dense Neural Network Model Made using Tensorflow that predicts whether the patient is healthy or has c

Adit Ahmedabadi 0 Jan 09, 2022
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these enviro

Google 1.5k Jan 02, 2023
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

Jonathan Shobrook 305 Dec 21, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022
I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

Thinam Tamang 43 Dec 20, 2022