Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

Interactive Image Segmentation via Backpropagating Refinement Scheme

Multi-Output Gaussian Process Toolkit

Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Orchestrating Distributed Materials Acceleration Platform Tutorial

Image segmentation with private İstanbul Dataset

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

Implementation of Online Label Smoothing in PyTorch

This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

Transfer SemanticKITTI labeles into other dataset/sensor formats.

A big endian Gentoo port developed on a Pine64.org RockPro64