Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Overview

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

This repo contains the Pytorch implementation of our paper:

Revisiting Contrastive Methods for UnsupervisedLearning of Visual Representations

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis and Luc Van Gool.

Contents

  1. Introduction
  2. Key Results
  3. Installation
  4. Training
  5. Evaluation
  6. Model Zoo
  7. Citation

Introduction

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. We first study how biases in the dataset affect existing methods. Our results show that an approach like MoCo works surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains. We show that learning additional invariances - through the use of multi-scale cropping, stronger augmentations and nearest neighbors - improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers.

Key Results

  • Scene-centric Data: We do not observe any indications that contrastive pretraining suffers from using scene-centric image data. This is in contrast to prior belief. Moreover, if the downstream data is non-object-centric, pretraining on scene-centric datasets even outperforms ImageNet pretraining.
  • Dense Representations: The multi-scale cropping strategy allows the model to learn spatially structured representations. This questions a recent trend that proposed additional losses at a denser level in the image. The representations can be used for semantic segment retrieval and video instance segmentation without any finetuning.
  • Additional Invariances: We impose additional invariances by exploring different data augmentations and nearest neighbors to boost the performance.
  • Transfer Performance: We observed that if a model obtains improvements for the downstream classification tasks, the same improvements are not guarenteed for other tasks (e.g. semantic segmentation) and vice versa.

Installation

The Python code runs with recent Pytorch versions, e.g. 1.6. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.2 -c pytorch
conda install -c conda-forge opencv           # For evaluation
conda install matplotlib scipy scikit-learn   # For evaluation

We refer to the environment.yml file for an overview of the packages we used to reproduce our results. The code was run on 2 Tesla V100 GPUs.

Training

Now, we will pretrain on the COCO dataset. You can download the dataset from the official website. Several scripts in the scripts/ directory are provided. It contains the vanilla MoCo setup and our additional modifications for both 200 epochs and 800 epochs of training. First, modify --output_dir and the dataset location in each script before executing them. Then, run the following command to start the training for 200 epochs:

sh scripts/ours_coco_200ep.sh # Train our model for 200 epochs.

The training currently supports:

  • MoCo
  • + Multi-scale constrained cropping
  • + AutoAugment
  • + kNN-loss

A detailed version of the pseudocode can be found in Appendix B.

Evaluation

We perform the evaluation for the following downstream tasks: linear classification (VOC), semantic segmentation (VOC and Cityscapes), semantic segment retrieval and video instance segmentation (DAVIS). More details and results can be found in the main paper and the appendix.

Linear Classifier

The representations can be evaluated under the linear evaluation protocol on PASCAL VOC. Please visit the ./evaluation/voc_svm directory for more information.

Semantic Segmentation

We provide code to evaluate the representations for the semantic segmentation task on the PASCAL VOC and Cityscapes datasets. Please visit the ./evaluation/segmentation directory for more information.

Segment Retrieval

In order to obtain the results from the paper, run the publicly available code with our weights as the initialization of the model. You only need to adapt the amount of clusters, e.g. 5.

Video Instance Segmentation

In order to obtain the results from the paper, run the publicly available code from Jabri et al. with our weights as the initialization of the model.

Model Zoo

Several pretrained models can be downloaded here. For a fair comparison, which takes the training duration into account, we refer to Figure 5 in the paper. More results can be found in Table 4 and Table 9.

Method Epochs VOC SVM VOC mIoU Cityscapes mIoU DAVIS J&F Download link
MoCo 200 76.1 66.2 70.3 - Model 🔗
Ours 200 85.1 71.9 72.2 - Model 🔗
MoCo 800 81.0 71.1 71.3 63.2 Model 🔗
Ours 800 85.9 73.5 72.3 66.2 Model 🔗

Citation

This code is based on the MoCo repository. If you find this repository useful for your research, please consider citing the following paper(s):

@article{vangansbeke2021revisiting,
  title={Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={arxiv preprint arxiv:2106.05967},
  year={2021}
}
@inproceedings{he2019moco,
  title={Momentum Contrast for Unsupervised Visual Representation Learning},
  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

For any enquiries, please contact the main authors.

Extra

  • For an overview on self-supervised learning (SSL), have a look at the overview repository.
  • Interested in self-supervised semantic segmentation? Check out our recent work: MaskContrast.
  • Interested in self-supervised classification? Check out SCAN.
  • Other great SSL repositories: MoCo, SupContrast, SeLa, SwAV and many more here.

License

This software is released under a creative commons license which allows for personal and research use only. You can view a license summary here. Part of the code was based on MoCo. Check it out for more details.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

Owner
Wouter Van Gansbeke
PhD researcher at KU Leuven. Especially interested in computer vision, machine learning and deep learning. Working on self-supervised and multi-task learning.
Wouter Van Gansbeke
object detection; robust detection; ACM MM21 grand challenge; Security AI Challenger Phase VII

赛题背景 在商品知识产权领域,知识产权体现为在线商品的设计和品牌。不幸的是,在每一天,存在着非法商户通过一些对抗手段干扰商标识别来逃避侵权,这带来了很高的知识产权风险和财务损失。为了促进先进的多媒体人工智能技术的发展,以保护企业来之不易的创作和想法免受恶意使用和剽窃,因此提出了鲁棒性标识检测挑战赛

65 Dec 22, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

KML: A Machine Learning Framework for Operating Systems & Storage Systems Storage systems and their OS components are designed to accommodate a wide v

File systems and Storage Lab (FSL) 186 Nov 24, 2022
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

0 Jul 15, 2021
Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Debabrata Mahapatra 40 Dec 24, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. The related paper is avai

26 Dec 13, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
Implementation of the paper titled "Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees"

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees Implementation of the paper titled "Using Sampling to

MIDAS, IIIT Delhi 2 Aug 29, 2022
🔥 Cannlytics-powered artificial intelligence 🤖

Cannlytics AI 🔥 Cannlytics-powered artificial intelligence 🤖 🏗️ Installation 🏃‍♀️ Quickstart 🧱 Development 🦾 Automation 💸 Support 🏛️ License ?

Cannlytics 3 Nov 11, 2022
Multiple style transfer via variational autoencoder

ST-VAE Multiple style transfer via variational autoencoder By Zhi-Song Liu, Vicky Kalogeiton and Marie-Paule Cani This repo only provides simple testi

13 Oct 29, 2022
dyld_shared_cache processing / Single-Image loading for BinaryNinja

Dyld Shared Cache Parser Author: cynder (kat) Dyld Shared Cache Support for BinaryNinja Without any of the fuss of requiring manually loading several

cynder 76 Dec 28, 2022
This application explain how we can easily integrate Deepface framework with Python Django application

deepface_suite This application explain how we can easily integrate Deepface framework with Python Django application install redis cache install requ

Mohamed Naji Aboo 3 Apr 18, 2022
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
face2comics by Sxela (Alex Spirin) - face2comics datasets

This is a paired face to comics dataset, which can be used to train pix2pix or similar networks.

Alex 164 Nov 13, 2022
Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

HanCo Dataset & Contrastive Representation Learning for Hand Shape Estimation Code in conjunction with the publication: Contrastive Representation Lea

Computer Vision Group, Albert-Ludwigs-Universität Freiburg 38 Dec 13, 2022
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
A PyTorch implementation of "DGC-Net: Dense Geometric Correspondence Network"

DGC-Net: Dense Geometric Correspondence Network This is a PyTorch implementation of our work "DGC-Net: Dense Geometric Correspondence Network" TL;DR A

191 Dec 16, 2022
A Kitti Road Segmentation model implemented in tensorflow.

KittiSeg KittiSeg performs segmentation of roads by utilizing an FCN based model. The model achieved first place on the Kitti Road Detection Benchmark

Marvin Teichmann 890 Jan 04, 2023
Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot

THUNLP 5 Jun 16, 2022
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022