PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

Related tags

Deep LearningMAE-priv
Overview

MAE for Self-supervised ViT

Introduction

This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

This repo is mainly based on moco-v3, pytorch-image-models and BEiT

TODO

  • visualization of reconstruction image
  • linear prob
  • more results
  • transfer learning
  • ...

Main Results

The following results are based on ImageNet-1k self-supervised pre-training, followed by ImageNet-1k supervised training for linear evaluation or end-to-end fine-tuning.

Vit-Base

pretrain
epochs
with
pixel-norm
linear
acc
fine-tuning
acc
100 False -- 75.58 [1]
100 True -- 77.19
800 True -- --

On 8 NVIDIA GeForce RTX 3090 GPUs, pretrain for 100 epochs needs about 9 hours, 4096 batch size needs about 24 GB GPU memory.

[1]. fine-tuning for 50 epochs;

Vit-Large

pretrain
epochs
with
pixel-norm
linear
acc
fine-tuning
acc
100 False -- --
100 True -- --

On 8 NVIDIA A40 GPUs, pretrain for 100 epochs needs about 34 hours, 4096 batch size needs about xx GB GPU memory.

Usage: Preparation

The code has been tested with CUDA 11.4, PyTorch 1.8.2.

Notes:

  1. The batch size specified by -b is the total batch size across all GPUs from all nodes.
  2. The learning rate specified by --lr is the base lr (corresponding to 256 batch-size), and is adjusted by the linear lr scaling rule.
  3. In this repo, only multi-gpu, DistributedDataParallel training is supported; single-gpu or DataParallel training is not supported. This code is improved to better suit the multi-node setting, and by default uses automatic mixed-precision for pre-training.
  4. Only pretraining and finetuning have been tested.

Usage: Self-supervised Pre-Training

Below is examples for MAE pre-training.

ViT-Base with 1-node (8-GPU, NVIDIA GeForce RTX 3090) training, batch 4096

python main_mae.py \
  -c cfgs/ViT-B16_ImageNet1K_pretrain.yaml \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

or

sh train_mae.sh

ViT-Large with 1-node (8-GPU, NVIDIA A40) pre-training, batch 2048

python main_mae.py \
  -c cfgs/ViT-L16_ImageNet1K_pretrain.yaml \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Usage: End-to-End Fine-tuning ViT

Below is examples for MAE fine-tuning.

ViT-Base with 1-node (8-GPU, NVIDIA GeForce RTX 3090) training, batch 1024

python main_fintune.py \
  -c cfgs/ViT-B16_ImageNet1K_finetune.yaml \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

ViT-Large with 2-node (16-GPU, 8 NVIDIA GeForce RTX 3090 + 8 NVIDIA A40) training, batch 512

python main_fintune.py \
  -c cfgs/ViT-B16_ImageNet1K_finetune.yaml \
  --multiprocessing-distributed --world-size 2 --rank 0 \
  [your imagenet-folder with train and val folders]

On another node, run the same command with --rank 1.

Note:

  1. We use --resume rather than --finetune in the DeiT repo, as its --finetune option trains under eval mode. When loading the pre-trained model, revise model_without_ddp.load_state_dict(checkpoint['model']) with strict=False.

[TODO] Usage: Linear Classification

By default, we use momentum-SGD and a batch size of 1024 for linear classification on frozen features/weights. This can be done with a single 8-GPU node.

python main_lincls.py \
  -a [architecture] --lr [learning rate] \
  --dist-url 'tcp://localhost:10001' \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \
  [your imagenet-folder with train and val folders]

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

If you use the code of this repo, please cite the original papre and this repo:

@Article{he2021mae,
  author  = {Kaiming He* and Xinlei Chen* and Saining Xie and Yanghao Li and Piotr Dolla ́r and Ross Girshick},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  journal = {arXiv preprint arXiv:2111.06377},
  year    = {2021},
}
@misc{yang2021maepriv,
  author       = {Lu Yang* and Pu Cao* and Yang Nie and Qing Song},
  title        = {MAE-priv},
  howpublished = {\url{https://github.com/BUPT-PRIV/MAE-priv}},
  year         = {2021},
}
Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

Theresa Wagner 1 Aug 10, 2022
Rohit Ingole 2 Mar 24, 2022
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

107 Dec 02, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
Toontown: Galaxy, a new Toontown game based on Disney's Toontown Online

Toontown: Galaxy The official archive repo for Toontown: Galaxy, a new Toontown

1 Feb 15, 2022
Code for the published paper : Learning to recognize rare traffic sign

Improving traffic sign recognition by active search This repo contains code for the paper : "Learning to recognise rare traffic signs" How to use this

samsja 4 Jan 05, 2023
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 08, 2022
Low Complexity Channel estimation with Neural Network Solutions

Interpolation-ResNet Invited paper for WSA 2021, called 'Low Complexity Channel estimation with Neural Network Solutions'. Low complexity residual con

Dianxin 10 Dec 10, 2022
This script scrapes and stores the availability of timeslots for Car Driving Test at all RTA Serivce NSW centres in the state.

This script scrapes and stores the availability of timeslots for Car Driving Test at all RTA Serivce NSW centres in the state. Dependencies Account wi

Balamurugan Soundararaj 21 Dec 14, 2022
Collection of generative models in Tensorflow

tensorflow-generative-model-collections Tensorflow implementation of various GANs and VAEs. Related Repositories Pytorch version Pytorch version of th

3.8k Dec 30, 2022
keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 09, 2022
PyTorch implementation of Neural Dual Contouring.

NDC PyTorch implementation of Neural Dual Contouring. Citation We are still writing the paper while adding more improvements and applications. If you

Zhiqin Chen 140 Dec 26, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
PlenOctrees: NeRF-SH Training & Conversion

PlenOctrees Official Repo: NeRF-SH training and conversion This repository contains code to train NeRF-SH and to extract the PlenOctree, constituting

Alex Yu 323 Dec 29, 2022
PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

DARDet PyTorch implementation of "DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images", [pdf]. Highlights: 1. We develop a new dense

41 Oct 23, 2022
A Unified Framework and Analysis for Structured Knowledge Grounding

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu

HKU NLP Group 370 Dec 21, 2022