MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Last update: Nov 28, 2022

Related tags

Overview

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv)

This is a Pytorch implementation of our paper. We present Vision Permutator, a conceptually simple and data efficient MLP-like architecture for visual recognition. We show that our Vision Permutators are formidable competitors to convolutional neural networks (CNNs) and vision transformers.

We hope this work could encourage researchers to rethink the way of encoding spatial information and facilitate the development of MLP-like models.

Basic structure of the proposed Permute-MLP layer. The proposed Permute-MLP layer contains three branches that are responsible for encoding features along the height, width, and channel dimensions, respectively. The outputs from the three branches are then combined using element-wise addition, followed by a fully-connected layer for feature fusion.

Our code is based on the pytorch-image-models, Token Labeling, T2T-ViT

Comparison with Recent MLP-like Models

Model	Parameters	Throughput	Image resolution	Top 1 Acc.	Download
EAMLP-14	30M	711 img/s	224	78.9%
gMLP-S	20M	-	224	79.6%
ResMLP-S24	30M	715 img/s	224	79.4%
ViP-Small/7 (ours)	25M	719 img/s	224	81.5%	link
EAMLP-19	55M	464 img/s	224	79.4%
Mixer-B/16	59M	-	224	78.5%
ViP-Medium/7 (ours)	55M	418 img/s	224	82.7%	link
gMLP-B	73M	-	224	81.6%
ResMLP-B24	116M	231 img/s	224	81.0%
ViP-Large/7	88M	298 img/s	224	83.2%	link

The throughput is measured on a single machine with V100 GPU (32GB) with batch size set to 32.

Training ViP-Small/7 takes less than 30h on ImageNet for 300 epochs on a node with 8 A100 GPUs.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Validation

Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path

CUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint

Training

Command line for training on 8 GPUs (V100)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model vip_s7 -b 256 -j 8 --opt adamw --epochs 300 --sched cosine --apex-amp --img-size 224 --drop-path 0.1 --lr 2e-3 --weight-decay 0.05 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 20

Reference

You may want to cite:

@misc{hou2021vision,
    title={Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition},
    author={Qibin Hou and Zihang Jiang and Li Yuan and Ming-Ming Cheng and Shuicheng Yan and Jiashi Feng},
    year={2021},
    eprint={2106.12368},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

License

This repository is released under the MIT License as found in the LICENSE file. For commercial use, please contact with the authors.

MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Related tags

Overview

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv)

Comparison with Recent MLP-like Models

Requirements

Validation

Training

Reference

License

Owner

Qibin (Andrew) Hou

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Detectorch - detectron for PyTorch

Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

A minimalist environment for decision-making in autonomous driving

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

FB-tCNN for SSVEP Recognition

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

Reinforcement learning algorithms in RLlib

Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Code for the paper "Generative design of breakwaters usign deep convolutional neural network as a surrogate model"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Related tags

Overview

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv)

Comparison with Recent MLP-like Models

Requirements

Validation

Training

Reference

License

Owner

Qibin (Andrew) Hou

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Detectorch - detectron for PyTorch

Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

A minimalist environment for decision-making in autonomous driving

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

FB-tCNN for SSVEP Recognition

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

Reinforcement learning algorithms in RLlib

Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

A graphical Semi-automatic annotation tool based on labelImg and Yolov5

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Code for the paper "Generative design of breakwaters usign deep convolutional neural network as a surrogate model"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.