This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Overview

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 46.0 41.6 48M 267G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 48.5 43.3 69M 359G config github/baidu github/baidu

Cascade Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 50.4 43.7 86M 745G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 51.9 45.0 107M 838G config github/baidu github/baidu
Swin-B ImageNet-1K 3x 51.9 45.0 145M 982G config github/baidu github/baidu

RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs
Swin-T ImageNet-1K 3x 50.0 - 45M 283G

Mask RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs
Swin-T ImageNet-1K 3x 50.3 43.6 47M 292G

Notes:

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] 

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL> 

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Owner
Swin Transformer
This organization maintains repositories built on Swin Transformers. The pretrained models locate at https://github.com/microsoft/Swin-Transformer
Swin Transformer
Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

PPML: Machine Learning on Data you cannot see Repository for the tutorial on Privacy-Preserving Machine Learning (PPML) presented at PyConDE 2022 Abst

Valerio Maggio 10 Aug 16, 2022
An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym gym-idsgame is a reinforcement learning environment for simulating at

Kim Hammar 29 Dec 03, 2022
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

4 Jun 28, 2022
A curated list of awesome projects and resources related fastai

A curated list of awesome projects and resources related fastai

Tanishq Abraham 138 Dec 22, 2022
social humanoid robots with GPGPU and IoT

Social humanoid robots with GPGPU and IoT Social humanoid robots with GPGPU and IoT Paper Authors Mohsen Jafarzadeh, Stephen Brooks, Shimeng Yu, Balak

0 Jan 07, 2022
Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Motion-Focused Contrastive Learning of Video Representations Introduction This is the code for the paper "Motion-Focused Contrastive Learning of Video

11 Sep 23, 2022
Unsupervised Image Generation with Infinite Generative Adversarial Networks

Unsupervised Image Generation with Infinite Generative Adversarial Networks Here is the implementation of MICGANs using DCGAN architecture on MNIST da

16 Dec 24, 2021
Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

Human-Level Control through Deep Reinforcement Learning Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning. This imp

Devsisters Corp. 2.4k Dec 26, 2022
Residual Dense Net De-Interlace Filter (RDNDIF)

Residual Dense Net De-Interlace Filter (RDNDIF) Work in progress deep de-interlacer filter. It is based on the architecture proposed by Bernasconi et

Louis 7 Feb 15, 2022
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

Tu Anh Dinh 1 Sep 07, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition Paper: MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition accepted fo

64 Dec 18, 2022
TensorFlow tutorials and best practices.

Effective TensorFlow 2 Table of Contents Part I: TensorFlow 2 Fundamentals TensorFlow 2 Basics Broadcasting the good and the ugly Take advantage of th

Vahid Kazemi 8.7k Dec 31, 2022
Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)

HifiFace — Unofficial Pytorch Implementation Image source: HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping (figure 1, pg. 1)

MINDs Lab 218 Jan 04, 2023
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

74 Dec 15, 2022
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
basic tutorial on pytorch

Quick Tutorial on PyTorch PyTorch Basics Linear Regression Logistic Regression Artificial Neural Networks Convolutional Neural Networks Recurrent Neur

7 Sep 15, 2022
Code/data of the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" (BMVC2021)

Hand-Object Contact Prediction (BMVC2021) This repository contains the code and data for the paper "Hand-Object Contact Prediction via Motion-Based Ps

Takuma Yagi 13 Nov 07, 2022