Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Overview

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Pytorch Implementation for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

If the project is useful to you, please give us a star. ⭐️

image

@article{gao2021disco,
  title={DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning},
  author={Gao, Yuting and Zhuang, Jia-Xin and Li, Ke and Cheng, Hao and Guo, Xiaowei and Huang, Feiyue and Ji, Rongrong and Sun, Xing},
  journal={arXiv preprint arXiv:2104.09124},
  year={2021}
}

Checkpoints

Teacher Models

Architecture Self-supervised Methods Model Checkpoints
ResNet152 MoCo-V2 Model
ResNet101 MoCo-V2 Model
ResNet50 MoCo-V2 Model

For teacher models such as ResNet-50*2 etc, we use their official implementation, which can be downloaded from their github pages.

Student Models by DisCo

Teacher/Students Efficient-B0 ResNet-18 Vit-Tiny XCiT-Tiny
ResNet-50 Model Model - -
ResNet-101 Model Model - -
ResNet-152 Model Model - -
ResNet-50*2 Model Model - -
ViT-Small - - Model -
XCiT-Small - - - Model

Requirements

  • Python3

  • Pytorch 1.6+

  • Detectron2

  • 8 GPUs are preferred

  • ImageNet, Cifar10/100, VOC, COCO

Run

Before running, we firstly move all data into share memory

cp /path/to/ImageNet /dev/shm

Pretrain Model

For pretraining baseline models with default hidden layer dimension in Tab1

# Switch to moco directory
cd moco

# R-50
python3 -u main_moco.py -a resnet50 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet50 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-101
python3 -u main_moco.py -a resnet101 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet101 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-152
python3 -u main_moco.py -a resnet152 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 800 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet152 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0799.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Mob
python3 -u main_moco.py -a mobilenetv3 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 512 /dev/shm 2>&1 |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a mobilenetv3 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B0
python3 -u main_moco.py -a efficientb0 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 2>&1  |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B1
python3 -u main_moco.py -a efficientb1 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0  --hidden 1280  /dev/shm  2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb1 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-18
python3 -u main_moco.py -a resnet18 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-34
python3 -u main_moco.py -a resnet34 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet34 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

DisCo

For training DisCo in Tab1, Comparision with baseline

# Switch to DisCo directory
cd DisCo

# R-50 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# R50w2 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50w2 --teacher /path/to/swav_RN50w2_400ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
#          Evaluation
python3 yt_main_lincls.py -a resnet18 --learning-rate 30.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar  /dev/shm 2>&1 | tee ./logs/std.log

For Tab2, Linear evaluation top-1 accuracy (%) on ImageNet compared with different distillation methods.

# RKD+DisCo, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar --use-mse /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# RKD, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab3 , **Object detection and instance segmentation results **

# Cp data to /dev/shm and set up path for Detectron2
cp -r /path/to/VOCdevkit/* /dev/shm/
cp -r /path/to/coco_2017 /dev/shm/coco
export DETECTRON2_DATASETS=/dev/shm

pip install /youtu-reid/jiaxzhuang/acmm/detectron2-0.4+cu101-cp36-cp36m-linux_x86_64.whl
cd detection

# Convert model for Detectron2
python3 convert-pretrain-to-detectron2.py /path/ckpt/checkpoint_0199.pth.tar ./output.pkl

# Evaluation on VOC
python3 train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml --num-gpus 8 --resume MODEL.RESNETS.DEPTH 34 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log
# Evaluation on CoCo
python3 train_net.py --config-file configs/coco_R_50_C4_2x_moco.yaml --num-gpus 8  --resume MODEL.RESNETS.DEPTH 18 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log

For Fig5 , evaluation on Semi-Supervised Tasks

# Copy 1%, 10% ImageNet from the complete ImageNet, according to split from SimCLR.
cd data
# Need to set up path to Compelete ImageNet and the output path.
python3 -u imagenet_1_fraction.py --ratio 1
python3 -u imagenet_1_fraction.py --ratio 10

# Evaluation on 1% ImageNet with Eff-B0 by DisCo
cp -r /path/to/imagenet_1_fraction/train  /dev/shm
cp -r /path/to/imagenet_1_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

# Evaluation on 10% ImageNet with R-18 by DisCo
cp -r /path/to/imagenet_10_fraction/train  /dev/shm
cp -r /path/to/imagenet_10_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

For Fig6, evaluation on Cifar10/Cifar100

# Copy Cifar10/100 to /dev/shm
cp /path/to/Cifar10/100 /dev/shm

# Evaluation on 1% Cifar10 with Eff-B0 by DisCo
python3 cifar_main_lincls.py -a efficientb0 --dataset cifar10 --lr 3 --epochs 200 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
# Evaluation on  Cifar100 with Resnet18 by DisCo
python3 cifar_main_lincls.py -a resnet18 --dataset cifar100 --lr 3 --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab4, Linear evaluation top-1 accuracy (%) on ImageNet, compared with SEED with consistent dimension in hidden layer.

python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab5, Linear evaluation top-1 accuracy (%) on ImageNet with SwAV as the testbed.

# SwAV, Train with SwAV only
cd swav-master
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --use_fp16 true 2>&1 | tee ./logs/std.log
# Evaluation
python3 -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --arch efficientb0 --data_path /dev/shm --pretrained /path/to/ckpt/checkpoints/ckp-199.pth 2>&1 | tee ./logs/std.log

# DisCo + SwAV
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav_distill.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --pretrained /path/to/swav_800ep_pretrain.pth.tar 2>&1 | tee ./logs/std.log

For Tab6, Linear evaluation top-1 accuracy (%) on ImageNet with variants of teacher pre-training methods.

# SwAV
python3 -u main.py -a resnet34 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch SWAVresnet50 --teacher /path/to/swav_800ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

Visualization

cd DisCo
# Generate Embed
# Move Embed to data path

python -u draw.py

Thanks

Code heavily depends on MoCo-V2, Detectron2.

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 09, 2022
This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields This is the code for "HyperNeRF: A Higher-Dimensional

Google 702 Jan 02, 2023
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Spectrum Surveying: The Python code in this repository implements the simulations and plots the figures described in the paper “Spectrum Surveying: Ac

Universitetet i Agder 2 Dec 06, 2022
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Šimkus 1 Apr 08, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Keon Lee 279 Jan 04, 2023
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 03, 2023
Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Posture: Pose Tracking and Machine Learning for prescribing corrective suggestions to improve posture and form while exercising. This repository conta

Pratham Mehta 10 Nov 11, 2022
Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Ditto: Building Digital Twins of Articulated Objects from Interaction Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu CVPR 2022, Oral Project | arxiv News 2022

UT Robot Perception and Learning Lab 78 Dec 22, 2022
Framework for evaluating ANNS algorithms on billion scale datasets.

Billion-Scale ANN http://big-ann-benchmarks.com/ Install The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Py

Harsha Vardhan Simhadri 132 Dec 24, 2022
Meta graph convolutional neural network-assisted resilient swarm communications

Resilient UAV Swarm Communications with Graph Convolutional Neural Network This repository contains the source codes of Resilient UAV Swarm Communicat

62 Dec 06, 2022
Using PyTorch Perform intent classification using three different models to see which one is better for this task

Using PyTorch Perform intent classification using three different models to see which one is better for this task

Yoel Graumann 1 Feb 14, 2022
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository will provide the official PyTorch implementa

Liming Jiang 238 Nov 25, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li

Mohsin Ali, Mohammed 1 Nov 12, 2021
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 05, 2023
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022
RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining Our code is based on Learning Attention-based Embed

宋朝都 4 Aug 07, 2022
A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021)

GDN A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021) Abstract In this paper, we consider an inverse problem i

4 Sep 13, 2022
Irrigation controller for Home Assistant

Irrigation Unlimited This integration is for irrigation systems large and small. It can offer some complex arrangements without large and messy script

Robert Cook 176 Jan 02, 2023