Implementation of momentum^2 teacher

Last update: Sep 26, 2022

Related tags

Deep Learning momentum2-teacher

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

All experiments are done with python3.6, torch==1.5.0; torchvision==0.6.0

Usage

Data Preparation

Prepare the ImageNet data in ${root_of_your_clone}/data/imagenet_train, ${root_of_your_clone}/data/imagenet_val. Since we have an internal platform(storage) to read imagenet, I have not tried the local mode. You may need to do some modification in momentum_teacher/data/dataset.py to support the local mode.

Training

Before training, ensure the path (namely ${root_of_clone}) is added in your PYTHONPATH, e.g.

export PYTHONPATH=$PYTHONPATH:${root_of_clone}

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the output folder, and the training log & models will be dumped to './outputs/${experiment-name}'
using -f to specify the description file of ur experiment.

e.g.,

python3 momentum_teacher/tools/train.py -b 256 -d 0-7 --experiment-name your_exp -f momentum_teacher/exps/arxiv/exp_8_v100/momentum2_teacher_100e_exp.py

Linear Evaluation:

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8 gpus machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the folder for saving pre-training models.

python3 momentum_teacher/tools/eval.py -b 256 --experiment-name your_exp -f momentum_teacher/exps/arxiv/linear_eval_exp_byol.py

Results

Results of Pretraining on a Single Machine

After pretraining on 8 NVIDIA V100 GPUS and 1024 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~1.8 day	70.7	-
path	200	~3.6 day	72.7	-
path	300	~5.5 day	73.8	-

After pretraining on 8 NVIDIA 2080 GPUS and 256 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	wights
path	100	~2.5 day	70.4	-
path	200	~5 day	72.3	-
path	300	~7.5 day	72.9	-

Results of Pretraining on Multiple Machines

E.g., To do unsupervised pre-training with 4096 batch-sizes and 32 V100 GPUs. run:

Suggesting that each machine has 8 V100 GPUs and there are 4 machines

# machine 1:
export MACHINE=0; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 2:
export MACHINE=1; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 3:
export MACHINE=2; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 4:
export MACHINE=3; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx

results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~11hour	70.3	-
path	200	~22hour	72.5	-
path	300	~33hour	73.7	-

To do unsupervised pre-training with 4096 batch-sizes and 128 2080 GPUs, pls follow the above guides. Results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~5hour	69.0	-
path	200	~10hour	71.5	-
path	300	~15hour	72.3	-

Disclaimer

This is an implementation for Momentum^2 Teacher, it is worth noting that:

The original implementation is based on our internal Platform.
This released version has slightly better performances compared with the tech report's.

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

Owner

jemmy li

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Workshop Materials Delivered on 28/02/2022

Chainer Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Train Yolov4 using NBX-Jobs

😊 Python module for face feature changing

Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Efficient face emotion recognition in photos and videos

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

HistoKT: Cross Knowledge Transfer in Computational Pathology

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

My implementation of DeepMind's Perceiver

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

The Simplest DCGAN Implementation

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

Owner

jemmy li

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Workshop Materials Delivered on 28/02/2022

Chainer Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Train Yolov4 using NBX-Jobs

😊 Python module for face feature changing

Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Efficient face emotion recognition in photos and videos

CAPRI: Context-Aware Interpretable Point-of-Interest Recommendation Framework

HistoKT: Cross Knowledge Transfer in Computational Pathology

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

My implementation of DeepMind's Perceiver

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

The Simplest DCGAN Implementation

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队