Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Overview

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

This repo contains the Pytorch implementation of our paper:

Revisiting Contrastive Methods for UnsupervisedLearning of Visual Representations

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis and Luc Van Gool.

Contents

  1. Introduction
  2. Key Results
  3. Installation
  4. Training
  5. Evaluation
  6. Model Zoo
  7. Citation

Introduction

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. We first study how biases in the dataset affect existing methods. Our results show that an approach like MoCo works surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains. We show that learning additional invariances - through the use of multi-scale cropping, stronger augmentations and nearest neighbors - improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers.

Key Results

  • Scene-centric Data: We do not observe any indications that contrastive pretraining suffers from using scene-centric image data. This is in contrast to prior belief. Moreover, if the downstream data is non-object-centric, pretraining on scene-centric datasets even outperforms ImageNet pretraining.
  • Dense Representations: The multi-scale cropping strategy allows the model to learn spatially structured representations. This questions a recent trend that proposed additional losses at a denser level in the image. The representations can be used for semantic segment retrieval and video instance segmentation without any finetuning.
  • Additional Invariances: We impose additional invariances by exploring different data augmentations and nearest neighbors to boost the performance.
  • Transfer Performance: We observed that if a model obtains improvements for the downstream classification tasks, the same improvements are not guarenteed for other tasks (e.g. semantic segmentation) and vice versa.

Installation

The Python code runs with recent Pytorch versions, e.g. 1.6. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.2 -c pytorch
conda install -c conda-forge opencv           # For evaluation
conda install matplotlib scipy scikit-learn   # For evaluation

We refer to the environment.yml file for an overview of the packages we used to reproduce our results. The code was run on 2 Tesla V100 GPUs.

Training

Now, we will pretrain on the COCO dataset. You can download the dataset from the official website. Several scripts in the scripts/ directory are provided. It contains the vanilla MoCo setup and our additional modifications for both 200 epochs and 800 epochs of training. First, modify --output_dir and the dataset location in each script before executing them. Then, run the following command to start the training for 200 epochs:

sh scripts/ours_coco_200ep.sh # Train our model for 200 epochs.

The training currently supports:

  • MoCo
  • + Multi-scale constrained cropping
  • + AutoAugment
  • + kNN-loss

A detailed version of the pseudocode can be found in Appendix B.

Evaluation

We perform the evaluation for the following downstream tasks: linear classification (VOC), semantic segmentation (VOC and Cityscapes), semantic segment retrieval and video instance segmentation (DAVIS). More details and results can be found in the main paper and the appendix.

Linear Classifier

The representations can be evaluated under the linear evaluation protocol on PASCAL VOC. Please visit the ./evaluation/voc_svm directory for more information.

Semantic Segmentation

We provide code to evaluate the representations for the semantic segmentation task on the PASCAL VOC and Cityscapes datasets. Please visit the ./evaluation/segmentation directory for more information.

Segment Retrieval

In order to obtain the results from the paper, run the publicly available code with our weights as the initialization of the model. You only need to adapt the amount of clusters, e.g. 5.

Video Instance Segmentation

In order to obtain the results from the paper, run the publicly available code from Jabri et al. with our weights as the initialization of the model.

Model Zoo

Several pretrained models can be downloaded here. For a fair comparison, which takes the training duration into account, we refer to Figure 5 in the paper. More results can be found in Table 4 and Table 9.

Method Epochs VOC SVM VOC mIoU Cityscapes mIoU DAVIS J&F Download link
MoCo 200 76.1 66.2 70.3 - Model 🔗
Ours 200 85.1 71.9 72.2 - Model 🔗
MoCo 800 81.0 71.1 71.3 63.2 Model 🔗
Ours 800 85.9 73.5 72.3 66.2 Model 🔗

Citation

This code is based on the MoCo repository. If you find this repository useful for your research, please consider citing the following paper(s):

@article{vangansbeke2021revisiting,
  title={Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={arxiv preprint arxiv:2106.05967},
  year={2021}
}
@inproceedings{he2019moco,
  title={Momentum Contrast for Unsupervised Visual Representation Learning},
  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

For any enquiries, please contact the main authors.

Extra

  • For an overview on self-supervised learning (SSL), have a look at the overview repository.
  • Interested in self-supervised semantic segmentation? Check out our recent work: MaskContrast.
  • Interested in self-supervised classification? Check out SCAN.
  • Other great SSL repositories: MoCo, SupContrast, SeLa, SwAV and many more here.

License

This software is released under a creative commons license which allows for personal and research use only. You can view a license summary here. Part of the code was based on MoCo. Check it out for more details.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

Owner
Wouter Van Gansbeke
PhD researcher at KU Leuven. Especially interested in computer vision, machine learning and deep learning. Working on self-supervised and multi-task learning.
Wouter Van Gansbeke
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 188 Dec 29, 2022
Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

SCAPT-ABSA Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training" Overvie

Zhengyan Li 66 Dec 04, 2022
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations This repo contains official code for the NeurIPS 2021 paper Imi

Jiayao Zhang 2 Oct 18, 2021
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
Code for CVPR 2021 paper TransNAS-Bench-101: Improving Transferrability and Generalizability of Cross-Task Neural Architecture Search.

TransNAS-Bench-101 This repository contains the publishable code for CVPR 2021 paper TransNAS-Bench-101: Improving Transferrability and Generalizabili

Yawen Duan 17 Nov 20, 2022
Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Jonas Köhler 893 Dec 28, 2022
LBK 35 Dec 26, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Programming with Neural Surrogates of Programs

Programming with Neural Surrogates of Programs

0 Dec 12, 2021
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 25, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

43 Nov 21, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python = 3.8 pytorch = 1.8.0

DV Lab 63 Dec 16, 2022
Repository containing the PhD Thesis "Formal Verification of Deep Reinforcement Learning Agents"

Getting Started This repository contains the code used for the following publications: Probabilistic Guarantees for Safe Deep Reinforcement Learning (

Edoardo Bacci 5 Aug 31, 2022
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022
Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

gans-collection.torch Torch implementation of various types of GANs (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN). Note that EBGAN and

Minchul Shin 53 Jan 22, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

🔥Deep Learning for 3D Point Clouds (IEEE TPAMI, 2020)

Qingyong 1.4k Jan 08, 2023
Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

CDAN Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018) New version: https://github.com/thuml/Transfer-Learning-Library Dataset

THUML @ Tsinghua University 363 Dec 20, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021