Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Last update: Jan 04, 2023

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

implement the finetune process
reuse the model in modeling_pretrain.py
caculate the normalized pixels target
add the cls token in the encoder
...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Note: the pretrain result is on the way ~

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Owner

Zhiliang Peng

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

A set of examples around hub for creating and processing datasets

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Extreme Lightwegith Portrait Segmentation

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

This tool uses Deep Learning to help you draw and write with your hand and webcam.

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

U-Time: A Fully Convolutional Network for Time Series Segmentation

AoT is a system for automatically generating off-target test harness by using build information.

Randomized Correspondence Algorithm for Structural Image Editing

Neural Logic Inductive Learning

This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.

Manifold-Mixup implementation for fastai V2

A rule learning algorithm for the deduction of syndrome definitions from time series data.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Code for database and frontend of webpage for Neural Fields in Visual Computing and Beyond.

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning