MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Overview

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv)

This is a Pytorch implementation of our paper. We present Vision Permutator, a conceptually simple and data efficient MLP-like architecture for visual recognition. We show that our Vision Permutators are formidable competitors to convolutional neural networks (CNNs) and vision transformers.

We hope this work could encourage researchers to rethink the way of encoding spatial information and facilitate the development of MLP-like models.

Compare

Basic structure of the proposed Permute-MLP layer. The proposed Permute-MLP layer contains three branches that are responsible for encoding features along the height, width, and channel dimensions, respectively. The outputs from the three branches are then combined using element-wise addition, followed by a fully-connected layer for feature fusion.

Our code is based on the pytorch-image-models, Token Labeling, T2T-ViT

Comparison with Recent MLP-like Models

Model Parameters Throughput Image resolution Top 1 Acc. Download Logs
EAMLP-14 30M 711 img/s 224 78.9%
gMLP-S 20M - 224 79.6%
ResMLP-S24 30M 715 img/s 224 79.4%
ViP-Small/7 (ours) 25M 719 img/s 224 81.5% link
EAMLP-19 55M 464 img/s 224 79.4%
Mixer-B/16 59M - 224 78.5%
ViP-Medium/7 (ours) 55M 418 img/s 224 82.7% link
gMLP-B 73M - 224 81.6%
ResMLP-B24 116M 231 img/s 224 81.0%
ViP-Large/7 88M 298 img/s 224 83.2% link

The throughput is measured on a single machine with V100 GPU (32GB) with batch size set to 32.

Training ViP-Small/7 takes less than 30h on ImageNet for 300 epochs on a node with 8 A100 GPUs.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Validation

Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path

CUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint

Training

Command line for training on 8 GPUs (V100)

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model vip_s7 -b 256 -j 8 --opt adamw --epochs 300 --sched cosine --apex-amp --img-size 224 --drop-path 0.1 --lr 2e-3 --weight-decay 0.05 --remode pixel --reprob 0.25 --aa rand-m9-mstd0.5-inc1 --smoothing 0.1 --mixup 0.8 --cutmix 1.0 --warmup-lr 1e-6 --warmup-epochs 20

Reference

You may want to cite:

@misc{hou2021vision,
    title={Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition},
    author={Qibin Hou and Zihang Jiang and Li Yuan and Ming-Ming Cheng and Shuicheng Yan and Jiashi Feng},
    year={2021},
    eprint={2106.12368},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

License

This repository is released under the MIT License as found in the LICENSE file. For commercial use, please contact with the authors.

Owner
Qibin (Andrew) Hou
Research fellow at NUS.
Qibin (Andrew) Hou
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

Christopher Ley 1 Jan 06, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
This repository is for Competition for ML_data class

This repository is for Competition for ML_data class. Based on mmsegmentatoin,mainly using swin transformer to completed the competition.

jianlong 2 Oct 23, 2022
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 05, 2022
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

THUDM 58 Dec 17, 2022
Python and Julia in harmony.

PythonCall & JuliaCall Bringing Python® and Julia together in seamless harmony: Call Python code from Julia and Julia code from Python via a symmetric

Christopher Rowley 414 Jan 07, 2023
Neural Koopman Lyapunov Control

Neural-Koopman-Lyapunov-Control Code for our paper: Neural Koopman Lyapunov Control Requirements dReal4: v4.19.02.1 PyTorch: 1.2.0 The learning framew

Vrushabh Zinage 6 Dec 24, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which outperforms the paper's (Hessel et al. 2017) results on 40% of tested games while using 20x less dat

Dominik Schmidt 31 Dec 21, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 07, 2022
Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

MetaSDF: Meta-learning Signed Distance Functions Project Page | Paper | Data Vincent Sitzmann*, Eric Ryan Chan*, Richard Tucker, Noah Snavely Gordon W

Vincent Sitzmann 100 Jan 01, 2023
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark Yong

19 Dec 17, 2022
An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

Global-Wheat-Detection An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wh

Chuxin Wang 11 Sep 25, 2022
A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

Wele Gedara Chaminda Bandara 214 Dec 29, 2022
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training The Unreasonable Effectiveness of

VITA 44 Dec 23, 2022
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu,

Layer6 Labs 37 Dec 18, 2022
A Pytorch implement of paper "Anomaly detection in dynamic graphs via transformer" (TADDY).

TADDY: Anomaly detection in dynamic graphs via transformer This repo covers an reference implementation for the paper "Anomaly detection in dynamic gr

Yue Tan 21 Nov 24, 2022
A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

GFNet-Pytorch (NeurIPS 2020) This repo contains the official code and pre-trained models for the glance and focus network (GFNet). Glance and Focus: a

Rainforest Wang 169 Oct 28, 2022
Credit fraud detection in Python using a Jupyter Notebook

Credit-Fraud-Detection - Credit fraud detection in Python using a Jupyter Notebook , using three classification models (Random Forest, Gaussian Naive Bayes, Logistic Regression) from the sklearn libr

Ali Akram 4 Dec 28, 2021
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021