[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Overview

Convolutional MLP

ConvMLP: Hierarchical Convolutional MLPs for Vision

Preprint link: ConvMLP: Hierarchical Convolutional MLPs for Vision

By Jiachen Li[1,2], Ali Hassani[1]*, Steven Walton[1]*, and Humphrey Shi[1,2,3]

In association with SHI Lab @ University of Oregon[1] and University of Illinois Urbana-Champaign[2], and Picsart AI Research (PAIR)[3]

Comparison

Abstract

MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4 GMACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and semantic segmentation further show that visual representation learned by ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters.

Model

How to run

Getting Started

Our base model is in pure PyTorch and Torchvision. No extra packages are required. Please refer to PyTorch's Getting Started page for detailed instructions.

You can start off with src.convmlp, which contains the three variants: convmlp_s, convmlp_m, convmlp_l:

from src.convmlp import convmlp_l, convmlp_s

model = convmlp_l(pretrained=True, progress=True)
model_sm = convmlp_s(num_classes=10)

Image Classification

timm is recommended for image classification training and required for the training script provided in this repository:

./dist_classification.sh $NUM_GPUS -c $CONFIG_FILE /path/to/dataset

You can use our training configurations provided in configs/classification:

./dist_classification.sh 8 -c configs/classification/convmlp_s_imagenet.yml /path/to/ImageNet
./dist_classification.sh 8 -c configs/classification/convmlp_m_imagenet.yml /path/to/ImageNet
./dist_classification.sh 8 -c configs/classification/convmlp_l_imagenet.yml /path/to/ImageNet

Object Detection

mmdetection is recommended for object detection training and required for the training script provided in this repository:

./dist_detection.sh $CONFIG_FILE $NUM_GPUS /path/to/dataset

You can use our training configurations provided in configs/detection:

./dist_detection.sh configs/detection/retinanet_convmlp_s_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/retinanet_convmlp_m_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/retinanet_convmlp_l_fpn_1x_coco.py 8 /path/to/COCO

Object Detection & Instance Segmentation

mmdetection is recommended for training Mask R-CNN and required for the training script provided in this repository (same as above).

You can use our training configurations provided in configs/detection:

./dist_detection.sh configs/detection/maskrcnn_convmlp_s_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/maskrcnn_convmlp_m_fpn_1x_coco.py 8 /path/to/COCO
./dist_detection.sh configs/detection/maskrcnn_convmlp_l_fpn_1x_coco.py 8 /path/to/COCO

Semantic Segmentation

mmsegmentation is recommended for semantic segmentation training and required for the training script provided in this repository:

./dist_segmentation.sh $CONFIG_FILE $NUM_GPUS /path/to/dataset

You can use our training configurations provided in configs/segmentation:

./dist_segmentation.sh configs/segmentation/fpn_convmlp_s_512x512_40k_ade20k.py 8 /path/to/ADE20k
./dist_segmentation.sh configs/segmentation/fpn_convmlp_m_512x512_40k_ade20k.py 8 /path/to/ADE20k
./dist_segmentation.sh configs/segmentation/fpn_convmlp_l_512x512_40k_ade20k.py 8 /path/to/ADE20k

Results

Image Classification

Feature maps from ResNet50, MLP-Mixer-B/16, our Pure-MLP Baseline and ConvMLP-M are presented in the image below. It can be observed that representations learned by ConvMLP involve more low-level features like edges or textures compared to the rest. Feature map visualization

Dataset Model Top-1 Accuracy # Params MACs
ImageNet ConvMLP-S 76.8% 9.0M 2.4G
ConvMLP-M 79.0% 17.4M 3.9G
ConvMLP-L 80.2% 42.7M 9.9G

If importing the classification models, you can pass pretrained=True to download and set these checkpoints. The same holds for the training script (classification.py and dist_classification.sh): pass --pretrained. The segmentation/detection training scripts also download the pretrained backbone if you pass the correct config files.

Downstream tasks

You can observe the summarized results from applying our model to object detection, instance and semantic segmentation, compared to ResNet, in the image below.

Object Detection

Dataset Model Backbone # Params APb APb50 APb75 Checkpoint
MS COCO Mask R-CNN ConvMLP-S 28.7M 38.4 59.8 41.8 Download
ConvMLP-M 37.1M 40.6 61.7 44.5 Download
ConvMLP-L 62.2M 41.7 62.8 45.5 Download
RetinaNet ConvMLP-S 18.7M 37.2 56.4 39.8 Download
ConvMLP-M 27.1M 39.4 58.7 42.0 Download
ConvMLP-L 52.9M 40.2 59.3 43.3 Download

Instance Segmentation

Dataset Model Backbone # Params APm APm50 APm75 Checkpoint
MS COCO Mask R-CNN ConvMLP-S 28.7M 35.7 56.7 38.2 Download
ConvMLP-M 37.1M 37.2 58.8 39.8 Download
ConvMLP-L 62.2M 38.2 59.9 41.1 Download

Semantic Segmentation

Dataset Model Backbone # Params mIoU Checkpoint
ADE20k Semantic FPN ConvMLP-S 12.8M 35.8 Download
ConvMLP-M 21.1M 38.6 Download
ConvMLP-L 46.3M 40.0 Download

Transfer

Dataset Model Top-1 Accuracy # Params
CIFAR-10 ConvMLP-S 98.0% 8.51M
ConvMLP-M 98.6% 16.90M
ConvMLP-L 98.6% 41.97M
CIFAR-100 ConvMLP-S 87.4% 8.56M
ConvMLP-M 89.1% 16.95M
ConvMLP-L 88.6% 42.04M
Flowers-102 ConvMLP-S 99.5% 8.56M
ConvMLP-M 99.5% 16.95M
ConvMLP-L 99.5% 42.04M

Citation

@article{li2021convmlp,
      title={ConvMLP: Hierarchical Convolutional MLPs for Vision}, 
      author={Jiachen Li and Ali Hassani and Steven Walton and Humphrey Shi},
      year={2021},
      eprint={2109.04454},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
SHI Lab
Research in Synergetic & Holistic Intelligence, with current focus on Computer Vision, Machine Learning, and AI Systems & Applications
SHI Lab
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
SOLOv2 on onnx & tensorRT

SOLOv2.tensorRT: NOTE: code based on WXinlong/SOLO add support to TensorRT inference onnxruntime tensorRT full_dims and dynamic shape postprocess with

47 Nov 26, 2022
Cognition-aware Cognate Detection

Cognition-aware Cognate Detection The repository which contains our code for our EACL 2021 paper titled, "Cognition-aware Cognate Detection". This wor

Prashant K. Sharma 1 Feb 01, 2022
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion Yinghao Aaron Li, Ali Zare, Nima Mesgarani We pres

Aaron (Yinghao) Li 282 Jan 01, 2023
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

Alipay 6 Dec 17, 2022
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss This repository contains the TensorFlow implementation of the paper UnF

Simon Meister 270 Nov 06, 2022
A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Higher Performance Pytorch Implementation of DeepLab V3 Plus Introduction This repo is an (re-)implementation of Encoder-Decoder with Atrous Separab

linhua 326 Nov 22, 2022
Simulator for FRC 2022 challenge: Rapid React

rrsim Simulator for FRC 2022 challenge: Rapid React out-1.mp4 Usage In order to run the simulator use the following: python3 rrsim.py [config_path] wh

1 Jan 18, 2022
[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

rl_portfolio This Repository uses Reinforcement Learning and Supervised learning to Optimize portfolio allocation. The goal is to make profitable agen

Deepender Singla 165 Dec 02, 2022
Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation (ICCV 2021) This repository contains the implementation of th

Yuhang Zang 21 Dec 17, 2022
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

Gaurav Parmar 606 Jan 06, 2023
PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation.

DosGAN-PyTorch PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation

40 Nov 30, 2022
Evaluation framework for testing segmentation networks in PyTorch

Evaluation framework for testing segmentation networks in PyTorch. What segmentation network to choose for next Kaggle competition? This benchmark knows the answer!

Eugene Khvedchenya 37 Apr 27, 2022
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 293 Dec 27, 2022
GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

UW-Madison Computational Materials Group 2 Feb 10, 2021
CVPR 2021

Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-image Translation [Paper] | [Poster] | [Codes] Yahui Liu1,3, Enver Sangineto1,

Yahui Liu 37 Sep 12, 2022
Author Disambiguation using Knowledge Graph Embeddings with Literals

Author Name Disambiguation with Knowledge Graph Embeddings using Literals This is the repository for the master thesis project on Knowledge Graph Embe

12 Oct 19, 2022