Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Last update: Dec 28, 2022

Related tags

Overview

Demysitifing Local Vision Transformer, arxiv

This is the official PyTorch implementation of our paper. We simply replace local self attention by (dynamic) depth-wise convolution with lower computational cost. The performance is on par with the Swin Transformer.

Besides, the main contribution of our paper is the theorical and detailed comparison between depth-wise convolution and local self attention from three aspects: sparse connectivity, weight sharing and dynamic weight. By this paper, we want community to rethinking the local self attention and depth-wise convolution, and the basic model architeture designing rules.

Codes and models for object detection and semantic segmentation are avaliable in Detection and Segmentation.

We also give MLP based Swin Transformer models and Inhomogenous dynamic convolution in the ablation studies. These codes and models will coming soon.

Reference

@article{han2021demystifying,
  title={Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight},
  author={Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong},
  journal={arXiv preprint arXiv:2106.04263},
  year={2021}
}

1. Requirements

torch>=1.5.0, torchvision, timm, pyyaml; apex-amp

data perpare: ImageNet dataset with the following structure:

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. Trainning

For tiny model, we train with batch-size 128 on 8 GPUs. When trainning base model, we use batch-size 64 on 16 GPUs with OpenMPI to keep the total batch-size unchanged. (With the same trainning setting, the base model couldn't train with AMP due to the anomalous gradient values.)

Please change the data path in sh scripts first.

For tiny model:

bash scripts/run_dwnet_tiny_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_tiny_patch4_window7_224.sh

For base model, use multi node with OpenMPI:

bash scripts/run_dwnet_base_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_base_patch4_window7_224.sh

3. Evaluation

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/change_to_config_file --resume /path/to/model --data-path /path/to/imagenet --eval

4. Models

Models are provided by training on ImageNet with resolution 224.

Model	#params	FLOPs	Top1 Acc	Download
dwnet_tiny	24M	3.8G	81.2	github
dynamic_dwnet_tiny	51M	3.8G	81.8	github
dwnet_base	74M	12.9G	83.2	github
dynamic_dwnet_base	162M	13.0G	83.2	github

Detection (see Detection for details):

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	model
DWNet-T	ImageNet-1K	3x	49.9	43.4	82M	730G	config	github
DWNet-B	ImageNet-1K	3x	51.0	44.1	132M	924G	config	github
Dynamic-DWNet-T	ImageNet-1K	3x	50.5	43.7	108M	730G	config	github
Dynamic-DWNet-B	ImageNet-1K	3x	51.2	44.4	219M	924G	config	github

Segmentation (see Segmentation for details):

Backbone	Pretrain	Lr Schd	mIoU	#params	FLOPs	config	model
DWNet-T	ImageNet-1K	160K	45.5	56M	928G	config	github
DWNet-B	ImageNet-1K	160K	48.3	108M	1129G	config	github
Dynamic-DWNet-T	ImageNet-1K	160K	45.7	83M	928G	config	github
Dynamic-DWNet-B	ImageNet-1K	160K	48.0	195M	1129G	config	github

LICENSE

This repo is under the MIT license. Some codes are borrow from Swin Transformer.

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

18 Oct 12, 2022

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

29 Nov 21, 2022

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

115 Jan 4, 2023

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

68 Dec 6, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

83 Dec 14, 2022

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

27 Mar 30, 2022

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

6 Jul 18, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

41 Nov 8, 2022

Comments

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0

Possible bug?

class DynamicDWConv(nn.Module):
    def __init__(self, dim, kernel_size, bias=True, stride=1, padding=1, groups=1, reduction=4):
        super().__init__()
        self.dim = dim
        self.kernel_size = kernel_size
        self.stride = stride 
        self.padding = padding 
        self.groups = groups 

        self.pool = nn.AdaptiveAvgPool2d((1, 1))
        self.conv1 = nn.Conv2d(dim, dim // reduction, 1, bias=False)
        self.bn = nn.BatchNorm2d(dim // reduction)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(dim // reduction, dim * kernel_size * kernel_size, 1)
        if bias:
            self.bias = nn.Parameter(torch.zeros(dim))
        else:
            self.bias = None

    def forward(self, x):
        b, c, h, w = x.shape
        weight = self.conv2(self.relu(self.bn(self.conv1(self.pool(x)))))
        weight = weight.view(b * self.dim, 1, self.kernel_size, self.kernel_size)
        x = F.conv2d(x.reshape(1, -1, h, w), weight, self.bias.repeat(b), stride=self.stride, padding=self.padding, groups=b * self.groups)
        x = x.view(b, c, x.shape[-2], x.shape[-1])
        return x

This function seems to give error when groups is not equal to dim.

opened by yxchng 0

Releases(prerelease)

prerelease(Aug 16, 2021)

Add pretrain models
Source code(tar.gz)
Source code(zip)
cascade_mask_rcnn_dwnet_base.pth(502.52 MB)
cascade_mask_rcnn_dwnet_tiny.pth(311.28 MB)
cascade_mask_rcnn_dynamic_dwnet_base.pth(835.44 MB)
cascade_mask_rcnn_dynamic_dwnet_tiny.pth(414.16 MB)
dwnet_base_224.pth(284.22 MB)
dwnet_tiny_224.pth(92.47 MB)
dynamic_dwnet_base_224.pth(617.14 MB)
dynamic_dwnet_tiny_224.pth(195.35 MB)
upernet_dwnet_base.pth(412.15 MB)
upernet_dwnet_tiny.pth(213.30 MB)
upernet_dynamic_dwnet_base.pth(745.06 MB)
upernet_dynamic_dwnet_tiny.pth(316.19 MB)

Owner

Attention for Vision and Visualization

GitHub Repository

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

201 Nov 09, 2022

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

57 Nov 12, 2022

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Polypharmacy - DDI - Synergy Survey The Survey Paper This repository accompanies our survey paper A Unified View of Relational Deep Learning for Polyp

79 Jan 05, 2023

Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Cascaded-FCN This repository contains the pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments the liver and its lesions out of

300 Nov 22, 2022

Machine Learning Models were applied to predict the mass of the brain based on gender, age ranges, and head size.

Brain Weight in Humans Variations of head sizes and brain weights in humans Kaggle dataset obtained from this link by Anubhab Swain. Image obtained fr

1 Feb 02, 2022

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

Tzer: TVM Implementation of "Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation (OOPSLA'22)“.

Artifact • Reproduce Bugs • Quick Start • Installation • Extend Tzer Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation This is the s

12 Dec 29, 2022

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.6k Jan 02, 2023

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi

83 Dec 05, 2022

PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

LiVLR-VideoQA We propose a Lightweight Visual-Linguistic Reasoning framework (LiVLR) for VideoQA. The overview of LiVLR: Evaluation on MSRVTT-QA Datas

7 Dec 30, 2022

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Neuron Merging: Compensating for Pruned Neurons Pytorch implementation of Neuron Merging: Compensating for Pruned Neurons, accepted at 34th Conference

33 Dec 30, 2022

Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

5 Apr 07, 2022

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

CORE This is the official PyTorch implementation for the paper: Yupeng Hou, Binbin Hu, Zhiqiang Zhang, Wayne Xin Zhao. CORE: Simple and Effective Sess

26 Dec 19, 2022

Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Project Aquarium Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Cep

73 Jul 21, 2022

Multi-Objective Loss Balancing for Physics-Informed Deep Learning

Multi-Objective Loss Balancing for Physics-Informed Deep Learning Code for ReLoBRaLo. Abstract Physics Informed Neural Networks (PINN) are algorithms

16 Dec 12, 2022

StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Jan 04, 2023

The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

45 Dec 26, 2022

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Related tags

Overview

Demysitifing Local Vision Transformer, arxiv

Reference

1. Requirements

2. Trainning

3. Evaluation

4. Models

LICENSE

You might also like...

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Comments

CVE-2007-4559 Patch

Patching CVE-2007-4559

Possible bug?

Releases(prerelease)

prerelease(Aug 16, 2021)

Owner

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Machine Learning Models were applied to predict the mass of the brain based on gender, age ranges, and head size.

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Tzer: TVM Implementation of "Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation (OOPSLA'22)“.

Deep learning library featuring a higher-level API for TensorFlow.

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

PyTorch code of paper "LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering"

Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Code to reproduce experiments in the paper "Explainability Requires Interactivity".

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Multi-Objective Loss Balancing for Physics-Informed Deep Learning

StarGAN - Official PyTorch Implementation (CVPR 2018)

The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Interactive Image Segmentation via Backpropagating Refinement Scheme

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"