Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Overview

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

This repo contains the Pytorch implementation of our paper:

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool.

PWC

Contents

  1. Introduction
  2. Installation
  3. Training
  4. Evaluation
  5. Model Zoo
  6. Citation

Introduction

Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. We make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case (e.g. PASCAL VOC). To achieve this, we introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings. Additionally, we argue about the importance of having a prior that contains information about objects, or their parts, and discuss several possibilities to obtain such a prior in an unsupervised manner. In particular, we adopt a mid-level visual prior to group pixels together and contrast the obtained object mask porposals. For this reason we name the method MaskContrast.

Installation

The Python code runs with recent Pytorch versions, e.g. 1.4. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorch
conda install -c conda-forge opencv           # For image transformations
conda install matplotlib scipy scikit-learn   # For evaluation
conda install pyyaml easydict                 # For using config files
conda install termcolor                       # For colored print statements

We refer to the requirements.txt file for an overview of the packages in the environment we used to produce our results. The code was run on 2 Tesla V100 GPUs.

Training MaskContrast

Setup

The PASCAL VOC dataset will be downloaded automatically when running the code for the first time. The dataset includes the precomputed supervised and unsupervised saliency masks, following the implementation from the paper.

The following files (in the pretrain/ and segmentation/ directories) need to be adapted in order to run the code on your own machine:

  • Change the file path for the datasets in data/util/mypath.py. The PASCAL VOC dataset will be saved to this path.
  • Specify the output directory in configs/env.yml. All results will be stored under this directory.

Pre-train model

The training procedure consists of two steps. First, pixels are grouped together based upon a mid-level visual prior (saliency is used). Then, a pre-training strategy is proposed to contrast the pixel-embeddings of the obtained object masks. The code for the pre-training can be found in the pretrain/ directory and the configuration files are located in the pretrain/configs/ directory. You can choose to run the model with the masks from the supervised or unsupervised saliency model. For example, run the following command to perform the pre-training step on PASCAL VOC with the supervised saliency model:

cd pretrain
python main.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml

Evaluation

Linear Classifier (LC)

We freeze the weights of the pre-trained model and train a 1 x 1 convolutional layer to predict the class assignments from the generated feature representations. Since the discriminative power of a linear classifier is low, the pixel embeddings need to be informative of the semantic class to solve the task in this way. To train the classifier run the following command:

cd segmentation
python linear_finetune.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml

Note, make sure that the pretraining variable in linear_finetune_VOCSegmentation_supervised_saliency.yml points to the location of your pre-trained model. You should get the following results:

mIoU is 63.95
IoU class background is 90.95
IoU class aeroplane is 83.78
IoU class bicycle is 30.66
IoU class bird is 78.79
IoU class boat is 64.57
IoU class bottle is 67.31
IoU class bus is 84.24
IoU class car is 76.77
IoU class cat is 79.10
IoU class chair is 21.24
IoU class cow is 66.45
IoU class diningtable is 46.63
IoU class dog is 73.25
IoU class horse is 62.61
IoU class motorbike is 69.66
IoU class person is 72.30
IoU class pottedplant is 40.15
IoU class sheep is 74.70
IoU class sofa is 30.43
IoU class train is 74.67
IoU class tvmonitor is 54.66

Unsurprisingly, the model has not learned a good representation for every class since some classes are hard to distinguish, e.g. chair or sofa.

We visualize a few examples after CRF post-processing below.

Clustering (K-means)

The feature representations are clustered with K-means. If the pixel embeddings are disentangled according to the defined class labels, we can match the predicted clusters with the ground-truth classes using the Hungarian matching algorithm.

cd segmentation
python kmeans.py --config_env configs/env.yml --config_exp configs/kmeans/kmeans_VOCSegmentation_supervised_saliency_model.yml

Remarks: Note that we perform the complete K-means fitting on the validation set to save memory and that the reported results were averaged over 5 different runs. You should get the following results (21 clusters):

IoU class background is 88.17
IoU class aeroplane is 77.41
IoU class bicycle is 26.18
IoU class bird is 68.27
IoU class boat is 47.89
IoU class bottle is 56.99
IoU class bus is 80.63
IoU class car is 66.80
IoU class cat is 46.13
IoU class chair is 0.73
IoU class cow is 0.10
IoU class diningtable is 0.57
IoU class dog is 35.93
IoU class horse is 48.68
IoU class motorbike is 60.60
IoU class person is 32.24
IoU class pottedplant is 23.88
IoU class sheep is 36.76
IoU class sofa is 26.85
IoU class train is 69.90
IoU class tvmonitor is 27.56

Model Zoo

Download the pretrained and linear finetuned models here.

Dataset Pixel Grouping Prior mIoU (LC) mIoU (K-means) Download link
PASCAL VOC Supervised Saliency - 44.2 Pretrained Model 🔗
PASCAL VOC Supervised Saliency 63.9 (65.5*) 44.2 Linear Finetuned 🔗
PASCAL VOC Unsupervised Saliency - 35.0 Pretrained Model 🔗
PASCAL VOC Unsupervised Saliency 58.4 (59.5*) 35.0 Linear Finetuned 🔗

* Denotes CRF post-processing.

To evaluate and visualize the predictions of the finetuned model, run the following command:

cd segmentation
python eval.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml --state-dict $PATH_TO_MODEL

You can optionally append the --crf-postprocess flag.

Citation

This code is based on the SCAN and MoCo repositories. If you find this repository useful for your research, please consider citing the following paper(s):

@article{vangansbeke2020unsupervised,
  title={Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={arxiv preprint arxiv:2102.06191},
  year={2021}
}
@inproceedings{vangansbeke2020scan,
  title={Scan: Learning to classify images without labels},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Proesmans, Marc and Van Gool, Luc},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2020}
}
@inproceedings{he2019moco,
  title={Momentum Contrast for Unsupervised Visual Representation Learning},
  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

For any enquiries, please contact the main authors.

For an overview on self-supervised learning, have a look at the overview repository.

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

Owner
Wouter Van Gansbeke
PhD researcher at KU Leuven. Especially interested in computer vision, machine learning and deep learning. Working on self-supervised and multi-task learning.
Wouter Van Gansbeke
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023
Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Symplectic Double Pendulum Simulator Double pendulum simulator using a symplectic Euler's method. The program calculates the momentum and position of

Scott Marino 1 Jan 12, 2022
R interface to fast.ai

R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod

113 Dec 20, 2022
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces

This repository contains source code for the paper Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces a

9 Nov 21, 2022
Filtering variational quantum algorithms for combinatorial optimization

Current gate-based quantum computers have the potential to provide a computational advantage if algorithms use quantum hardware efficiently.

1 Feb 09, 2022
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022
Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Self-Supervised Learning (SimCLR) with Biological Plausible Image Augmentations Official code base for the poster "On the use of Cortical Magnificatio

Binxu 8 Aug 17, 2022
FreeSOLO for unsupervised instance segmentation, CVPR 2022

FreeSOLO: Learning to Segment Objects without Annotations This project hosts the code for implementing the FreeSOLO algorithm for unsupervised instanc

NVIDIA Research Projects 253 Jan 02, 2023
IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

This repo is the official implementation of our paper "Instance Adaptive Self-training for Unsupervised Domain Adaptation". The purpose of this repo is to better communicate with you and respond to y

CVSM Group - email: <a href=[email protected]"> 84 Dec 12, 2022
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images In this paper, we present an effective Dynamic Enhancement Anchor

13 Dec 09, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
Project of 'TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement '

TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement Codes for TMM20 paper "TBEFN: A Two-branch Exposure-fusion Network for Low

KUN LU 31 Nov 06, 2022
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 03, 2023
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.4k Dec 31, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors

DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors By Anargyros Chatzitofis, Dimitris Zarpalas, Stefanos Kollias

tofis 24 Oct 08, 2022
[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

GP-UNIT - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Unsupervised Image-to-

Shuai Yang 125 Jan 03, 2023
Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022 A Multi-Modal Fact Verification dataset released as part of the De-Factify workshop in

Wei-Yao Wang 11 Nov 15, 2022
This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

AS-MLP architecture for Image Classification Model Zoo Image Classification on ImageNet-1K Network Resolution Top-1 (%) Params FLOPs Throughput (image

SVIP Lab 106 Dec 12, 2022