SiT: Self-supervised vIsion Transformer

Last update: Dec 28, 2022

Related tags

Overview

SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

The training strategy is adopted from Deit

Usage

Create an environment

conda create -n SiT python=3.8

Activate the environment and install the necessary packages

conda activate SiT

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

pip install -r requirements.txt

Self-supervised pre-training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

Finetuning

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10' --validate-every 10

Linear Evaluation

Linear projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE' --validate-every 10 --SiT_LinearEvaluation 1

2-layer MLP projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE_hidden' --validate-every 10 --SiT_LinearEvaluation 1 --representation-size 1024

Note: assign the --dataset_location parameter to the location of the downloaded dataset

If you use this code for a paper, please cite:

@article{atito2021sit,

  title={SiT: Self-supervised vIsion Transformer},

  author={Atito, Sara and Awais, Muhammad and Kittler, Josef},

  journal={arXiv preprint arXiv:2104.03602},

  year={2021}

}

License

This repository is released under the GNU General Public License.

SiT: Self-supervised vIsion Transformer

Related tags

Overview

SiT: Self-supervised vIsion Transformer

Usage

Self-supervised pre-training

Finetuning

Linear Evaluation

License

Owner

Sara Ahmed

Here I will explain the flow to deploy your custom deep learning models on Ultra96V2.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

GBIM(Gesture-Based Interaction map)

This program was designed to detect whether someone is wearing a facemask through a live video stream.

Gym for multi-agent reinforcement learning

Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

Blind Video Temporal Consistency via Deep Video Prior

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

A novel Engagement Detection with Multi-Task Training (ED-MTT) system

A machine learning malware analysis framework for Android apps.

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Download files from DSpace systems (because for some reason DSpace won't let you)

WarpRNNT loss ported in Numba CPU/CUDA for Pytorch

Unsupervised Video Interpolation using Cycle Consistency

Official Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)