Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Last update: Dec 14, 2021

Related tags

Overview

mae-repo

PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https://github.com/lucidrains/vit-pytorch (for MAE architectures) and https://github.com/pengzhiliang/MAE-pytorch (for training loop).

prepare ImageNet1K datasets

To train MAE, one should prepare ImageNet_ILSVRC2012 and place ILSVRC2012_*.tar in the ${datasets_path}. To shorten the overhead of first run, one can manually untar the tarfile into train and val directories, as follow (refered to https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..

mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

modify configuration file

To separate code and config, we try to split configurations to yaml file, located in configs directory, such as imagenet1k-vit-base.yml. One can modify 'model' setting following MAE and ViT to configure model architecture parameters of ViT-base, large and huge.

One can modify 'optim' for optimizer settings. And modify 'training' and 'data' for training settings. Note that, modify 'training:batch_size' to fit the GPU memory of one GPU card. Total batch_size is equal to batch_size multiplied by number of GPU cards.

train

CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 mae_test.py
--datasets_path ${datasets_path}
--config imagenet1k-vit-base.yml
--doc mae-vit-base16-dec8-512

ToDo lists

add pretrain mode
add fine-tunning mode
support mixed precision training
support distributed training
verify the correctness of this re-implementation

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Related tags

Overview

mae-repo

prepare ImageNet1K datasets

modify configuration file

train

ToDo lists

Owner

Peng Qiao

An auto discord account and token generator. Automatically verifies the phone number. Works without proxy. Bypasses captcha.

Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Code repo for "Transformer on a Diet" paper

Implementation of Neural Style Transfer in Pytorch

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

This implements the learning and inference/proposal algorithm described in "Learning to Propose Objects, Krähenbühl and Koltun"

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

paper: Hyperspectral Remote Sensing Image Classification Using Deep Convolutional Capsule Network

FewBit — a library for memory efficient training of large neural networks

This project is used for the paper Differentiable Programming of Isometric Tensor Network

An OpenAI Gym environment for multi-agent car racing based on Gym's original car racing environment.

This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

Kaggle competition: Springleaf Marketing Response

Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

Source code for our paper "Empathetic Response Generation with State Management"