Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Last update: Mar 14, 2022

Overview

Swin-Transformer

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, please refer to "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

This repo is an implementation of MegEngine version Swin-Transformer. This is also a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.

There is also an official PyTorch implementation.

Usage

Install

Clone this repo:

git clone https://github.com/MegEngine/swin-transformer.git
cd swin-transformer

Install megengine==1.6.0

pip3 install megengine==1.6.0 -f https://megengine.org.cn/whl/mge.html

Training

To train a Swin Transformer using random data, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> train_random.py

To train a Swin Transformer using AMP (Auto Mix Precision), run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --mode mp train_random.py

To train a Swin Transformer using DTR in dynamic graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --dtr [--dtr-thd <eviction-threshold-of-dtr>] train_random.py

To train a Swin Transformer using DTR in static graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --trace --symbolic --dtr --dtr-thd <eviction-threshold-of-dtr> train_random.py

For example, to train a Swin Transformer with a single GPU using DTR in static graph mode with threshold=8GB and AMP, run:

python3 -n 1 -b 340 -s 10 --trace --symbolic --dtr --dtr-thd 8 --mode mp train_random.py

For more usage, run:

python3 train_random.py -h

Benchmark

Testing Devices
- 2080Ti @ cuda-10.1-cudnn-v7.6.3-TensorRT-5.1.5.0 @ Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- Reserve all CUDA memory by setting MGB_CUDA_RESERVE_MEMORY=1, in order to alleviate memory fragmentation problem

Settings	Maximum Batch Size	Speed(s/step)	Throughput(images/s)
None	68	0.490	139
AMP	100	0.494	202
DTR in static graph mode	300	2.592	116
DTR in static graph mode + AMP	340	1.944	175

Acknowledgement

We are inspired by the Swin-Transformer repository, many thanks to microsoft!

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Related tags

Overview

Swin-Transformer

Usage

Install

Training

Benchmark

Acknowledgement

Owner

旷视天元 MegEngine

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

Proof of concept GnuCash Webinterface

Invert and perturb GAN images for test-time ensembling

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

[UNMAINTAINED] Automated machine learning for analytics & production

Domain Generalization with MixStyle, ICLR'21.

A curated list of resources for Image and Video Deblurring

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

The Agriculture Domain of ERPNext comes with features to record crops and land

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

RLDS stands for Reinforcement Learning Datasets

Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

Tensorflow AffordanceNet and AffContext implementations

Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Outlier Exposure with Confidence Control for Out-of-Distribution Detection

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"