DeepMind Perceiver (in PyTorch)

Disclaimer: This is not official and I'm not affiliated with DeepMind.

My implementation of the Perceiver: General Perception with Iterative Attention. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in models/mnist.pkl or by using perceiver.load_mnist_model(). It gets 96.02% on the test-data.

Getting started

To run this you need PyTorch installed:

pip3 install torch

From perceiver you can import Perceiver or PerceiverLogits.

Then you can use it as such (or look in examples.ipynb):

from perceiver import Perceiver

model = Perceiver(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

from perceiver import PerceiverLogits

model = PerceiverLogits(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    output_features, # <- How many different classes? E.g. 10 for MNIST.
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

To use my pre-trained MNIST model (not very good):

from perceiver import load_mnist_model

model = load_mnist_model()

TODO:

Positional embedding generalized to n dimensions (with fourier features)
Train other models (like CIFAR-100 or something not in the image domain)
Type indication
Unit tests for components of model
Package

My implementation of DeepMind's Perceiver

Related tags

Overview

DeepMind Perceiver (in PyTorch)

Getting started

TODO:

Owner

Louis Arge

App customer segmentation cohort rfm clustering

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Capstone-Project-2 - A game program written in the Python language

source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

[SIGGRAPH 2020] Attribute2Font: Creating Fonts You Want From Attributes

Change Detection in SAR Images Based on Multiscale Capsule Network

TDmatch is a Python library developed to perform matching tasks in three categories:

🎁 3,000,000+ Unsplash images made available for research and machine learning

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

Applying curriculum to meta-learning for few shot classification

🛰️ List of earth observation companies and job sites

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

[ICCV21] Self-Calibrating Neural Radiance Fields

OCR Post Correction for Endangered Language Texts

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

The original weights of some Caffe models, ported to PyTorch.

PyContinual (An Easy and Extendible Framework for Continual Learning)