PyTorch implementation of Pay Attention to MLPs

Last update: Dec 13, 2022

Overview

gMLP

PyTorch implementation of Pay Attention to MLPs.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/g-mlp.git

Navigate to the cloned directory. You can use the barebone gMLP model via

>>> from g_mlp import gMLP
>>> model = gMLP()

By default, the model comes with the following parameters:

gMLP(
    d_model=256,
    d_ffn=512,
    seq_len=256,
    num_layers=6,
)

Usage

The repository also contains gMLP models specifically for language modeling and image classification.

NLP

gMLPForLanguageModeling shares the same default parameters as gMLP, with num_tokens=10000 as an added parameter that represents the size of the token embedding table.

>>> from g_mlp import gMLPForLanguageModeling
>>> model = gMLPForLanguageModeling()
>>> tokens = torch.randint(0, 10000, (8, 256))
>>> model(tokens).shape
torch.Size([8, 256, 256])

Computer Vision

gMLPForImageClassification is a ViT-esque version of gMLP that includes a patch creating layer and a final classification head.

>>> from g_mlp import gMLPForImageClassification
>>> model = gMLPForImageClassification()
>>> images = torch.randn(8, 3, 256, 256)
>>> model(images).shape
torch.Size([8, 1000])

Summary

The authors of the paper present gMLP, an an attention-free all-MLP architecture based on spatial gating units. gMLP achieves parity with transformer models such as ViT and BERT on language and vision downstream tasks. The authors also show that gMLP scales with increased data and number of parameters, suggesting that self-attention is not a necessary component for designing performant models.

PyTorch implementation of Pay Attention to MLPs

Related tags

Overview

gMLP

Quickstart

Usage

NLP

Computer Vision

Summary

Resources

Owner

Jake Tae

We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Using OpenAI's CLIP to upscale and enhance images

Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

zeus is a Python implementation of the Ensemble Slice Sampling method.

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Active learning for Mask R-CNN in Detectron2

Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

Betafold - AlphaFold with tunings

Learning Chinese Character style with conditional GAN

Large-Scale Unsupervised Object Discovery

Bayesian Generative Adversarial Networks in Tensorflow

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Graph Neural Networks with Keras and Tensorflow 2.

PyTorch implementation of Constrained Policy Optimization

Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training".

Buffon’s needle: one of the oldest problems in geometric probability