Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Last update: Dec 08, 2022

Related tags

Overview

MLP-Mixer

Pytorch reimplementation of Google's repository for the MLP-Mixer (Not yet updated on the master branch) that was released with the paper MLP-Mixer: An all-MLP Architecture for Vision by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy.

In this paper, the authors show a performance close to SotA in an image classification benchmark using MLP(Multi-layer perceptron) without using CNN and Transformer.

MLP-Mixer (Mixer for short) consists of per-patch linear embeddings, Mixer layers, and a classifier head. Mixer layers contain one token-mixing MLP and one channel-mixing MLP, each consisting of two fully-connected layers and a GELU nonlinearity. Other components include: skip-connections, dropout, and linear classifier head.

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

Available models: Mixer-B_16, Mixer-L_16
- imagenet pre-train models
  - Mixer-B_16, Mixer-L_16
- imagenet-21k pre-train models
  - Mixer-B_16, Mixer-L_16

# imagenet pre-train
wget https://storage.googleapis.com/mixer_models/imagenet1k/{MODEL_NAME}.npz

# imagenet-21k pre-train
wget https://storage.googleapis.com/mixer_models/imagenet21k/{MODEL_NAME}.npz

2. Fine-tuning

python3 train.py --name cifar10-100_500 --model_type Mixer-B_16 --pretrained_dir checkpoint/Mixer-B_16.npz

Reproducing Mixer results

upstream	model	dataset	acc(official)
ImageNet	Mixer-B/16	cifar10	96.72
ImageNet	Mixer-L/16	cifar10	96.59
ImageNet-21k	Mixer-B/16	cifar10	96.82
ImageNet-21k	Mixer-L/16	cifar10	96.34

Reference

Google's Vision Transformer and MLP-Mixer

Citations

@article{tolstikhin2021,
  title={MLP-Mixer: An all-MLP Architecture for Vision},
  author={Tolstikhin, Ilya and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey},
  journal={arXiv preprint arXiv:2105.01601},
  year={2021}
}

Pytorch reimplementation of the Mixer (MLP-Mixer: An all-MLP Architecture for Vision)

Related tags

Overview

MLP-Mixer

Usage

1. Download Pre-trained model (Google's Official Checkpoint)

2. Fine-tuning

Reproducing Mixer results

Reference

Citations

Owner

Eunkwang Jeon

Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Official TensorFlow code for the forthcoming paper

A simple, fast, and efficient object detector without FPN

Vision-and-Language Navigation in Continuous Environments using Habitat

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Code for: https://berkeleyautomation.github.io/bags/

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

Migration of Edge-based Distributed Federated Learning

Simulation of self-focusing of laser beams in condensed media

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

Voice assistant - Voice assistant with python

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"