MLP-Mixer-Pytorch

PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained parameters.

Usage

import torch
import numpy as np
from mlp_mixer import MlpMixer

pretrain_model='./pretrain_models/imagenet21k_Mixer-B_16.npz'

model = MlpMixer(num_classes=10, 
                 num_blocks=12, 
                 patch_size=16, 
                 hidden_dim=768, 
                 tokens_mlp_dim=384, 
                 channels_mlp_dim=3072, 
                 image_size=224
                 )

# load official ImageNet pre-trained model:
model.load_from(np.load(pretrain_model))
print ('Finish loading the pre-trained model!')

num_param = sum(p.numel() for p in model.parameters()) / 1e6
print ('Total params.: %f M'%num_param)

pred = model(img)

Fine-tuning

Download the official pre-trained models at https://console.cloud.google.com/storage/mixer_models/.

Hypyer-parameters setting for better fine-tuning:

optim = torch.optim.SGD(param_list, 
                        lr=5e-4, 
                        weight_decay=1e-7,
                        momentum=0.9, 
                        nesterov=True
                        )
lr_schdlr = WarmupCosineLrScheduler(optim, 
                                    n_iters_all, 
                                    warmup_iter=0
                                    )

Using the pre-trained model to fine-tune MLP-Mixer can obtain remarkable improvements (e.g., +10% accuracy on a small dataset).

Note that we can also change the patch_size (e.g., patch_size=8) for inputs with different resolutions, but smaller patch_size may not always bring performance improvements.

Citation

@misc{tolstikhin2021mlpmixer,
      title={MLP-Mixer: An all-MLP Architecture for Vision}, 
      author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
      year={2021},
      eprint={2105.01601},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

The implementation is based on the original paper and the official Tensorflow repo: https://github.com/google-research/vision_transformer.
It also refers to the re-implementation repo: https://github.com/d-li14/mlp-mixer.pytorch.

Pytorch implementation of MLP-Mixer with loading pre-trained models.

Related tags

Overview

MLP-Mixer-Pytorch

Usage

Fine-tuning

Citation

Acknowledgement

Owner

Qiushi Yang

Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

Plug and play transformer you can find network structure and official complete code by clicking List

Canonical Capsules: Unsupervised Capsules in Canonical Pose (NeurIPS 2021)

In this project, we create and implement a deep learning library from scratch.

Nvidia Semantic Segmentation monorepo

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Exponential Graph is Provably Efficient for Decentralized Deep Training

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Boosted CVaR Classification (NeurIPS 2021)

AI4Good project for detecting waste in the environment

Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.