Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Last update: Oct 15, 2022

Overview

Perceiver

This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on top of Transformers such that the data only enters through the cross attention mechanism (see figure) and allow it to scale to hundreds of thousands of inputs, like ConvNets. This, in part also solves the Transformers Quadratic compute and memory bottleneck.

Yannic Kilcher's video was very helpful.

Installation

Run the following to install:

pip install perceiver

Developing `perceiver`

To install perceiver, along with tools you need to develop and test, run the following in your virtualenv:

git clone https://github.com/Rishit-dagli/Perceiver.git
# or clone your own fork

cd perceiver
pip install -e .[dev]

A bit about Perceiver

The Perceiver model aims to deal with arbitrary configurations of different modalities using a single transformer-based architecture. Transformers are often flexible and make few assumptions about their inputs, but that also scale quadratically with the number of inputs in terms of both memory and computation. This model proposes a mechanism that makes it possible to deal with high-dimensional inputs, while retaining the expressivity and flexibility to deal with arbitrary input configurations.

The idea here is to introduce a small set of latent units that forms an attention bottleneck through which the inputs must pass. This avoids the quadratic scaling problem of all-to-all attention of a classical transformer. The model can be seen as performing a fully end-to-end clustering of the inputs, with the latent units as the cluster centres, leveraging a highly asymmetric crossattention layer. For spatial information the authors compensate for the lack of explicit grid structures in our model by associating Fourier feature encodings.

Usage

from perceiver import Perceiver
import tensorflow as tf

model = Perceiver(
    input_channels = 3,          # number of channels for each token of the input
    input_axis = 2,              # number of axis for input data (2 for images, 3 for video)
    num_freq_bands = 6,          # number of freq bands, with original value (2 * K + 1)
    max_freq = 10.,              # maximum frequency, hyperparameter depending on how fine the data is
    depth = 6,                   # depth of net
    num_latents = 256,           # number of latents
    latent_dim = 512,            # latent dimension
    cross_heads = 1,             # number of heads for cross attention. paper said 1
    latent_heads = 8,            # number of heads for latent self attention, 8
    cross_dim_head = 64,
    latent_dim_head = 64,
    num_classes = 1000,          # output number of classes
    attn_dropout = 0.,
    ff_dropout = 0.,
)

img = tf.random.normal([1, 224, 224, 3]) # replicating 1 imagenet image
model(img) # (1, 1000)

About the notebooks

`perceiver_example`

This notebook installs the perceiver package and shows an example of running it on a single imagenet image ([1, 224, 224, 3]) with 1000 classes to demonstarte the working of this model.

Want to Contribute 🙋‍♂️ ?

Awesome! If you want to contribute to this project, you're always welcome! See Contributing Guidelines. You can also take a look at open issues for getting more information about current or upcoming tasks.

Want to discuss? 💬

Have any questions, doubts or want to present your opinions, views? You're always welcome. You can start discussions.

Citations

@misc{jaegle2021perceiver,
    title   = {Perceiver: General Perception with Iterative Attention},
    author  = {Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira},
    year    = {2021},
    eprint  = {2103.03206},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

error with tf2.4.1

Hello Rishit,

thank you for your Perceiver implementation! I have two notes, I am not very familiar with tf2 though. You define and call a tf.keras.Sequential model here https://github.com/Rishit-dagli/Perceiver/blob/4d3b9b0514da4fb623d178e3e70df1836ebad5ba/perceiver/perceiver.py#L106 For my version of tf at least this throws an error, I think it should be defined once in __init__ and then just called in call.

And just above it, you compute data but then you don't pass it to self.model. Is that correct?
bug

opened by abred 3
Training code

Hi there,

I've tried to set up a standard MNIST training over the last few days using the Perceiver code provided here. So far, I've not been able to come up with any solution where the model actually learns anything. A major problem so far has been the way the model is written with no support for model.fit() and the whole functional API.

Do you happen to have any training example code for your model which you could provide here in this repo? MNIST as the default starting point would be nice, but anything would do the job as well :)
question

opened by tpetri94 2
Create a FeedForward layer

Create a simple FeedForward layer as a tf.keras.layers.Layer which should essentially contain a Dense layer with the modified GELU activation (#2 ), optionally I could also include a dropout layer and another Dense layer which should have the number of neurons equal to the dimension

opened by Rishit-dagli 0
Implement a PreNorm layer

Create a Normalization layer from the tf.keras.layerr.Layers. This should essentially figure out the right axis and implement layer normalization on it.

opened by Rishit-dagli 0
Don't pin TensorFlow version to a specific number

Hello,

In setup.py you should change "tensorflow~=2.4.0" to " "tensorflow>2.4.0" to ensure any version above the minimal one is used.
bug

opened by ebursztein 0

Releases(v0.1.2)

v0.1.2(Apr 26, 2021)

Fixed an error if used when decorated with @tf.function that it tries to make a variable when not in the first call (#20 )

Many thanks 🙏 to @abred for pointing this out
Source code(tar.gz)
Source code(zip)
v0.1.1(Apr 13, 2021)

This release adds an example to demonstrate the use of this package.
Source code(tar.gz)
Source code(zip)
v0.1.0(Apr 13, 2021)

This is the initial release of Perceiver and implements Perceiver Model as a tf.keras.Model class.
Source code(tar.gz)
Source code(zip)

Owner

Rishit Dagli

High School,TEDx,2xTED-Ed speaker | International Speaker | Microsoft Student Ambassador | Mentor, @TFUGMumbai | Organize @KotlinMumbai

GitHub Repository https://rishit-dagli.github.io/Perceiver/

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

42 Aug 14, 2022

Pytorch implementation of forward and inverse Haar Wavelets 2D

9 Oct 30, 2022

Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN - hifi_gan/checkpoint/ : pretrain 2.5M ste

1 Dec 18, 2021

Toolchain to build Yoshi's Island from source code

Project-Y Toolchain to build Yoshi's Island (J) V1.0 from source code, by MrL314 Last updated: September 17, 2021 Setup To begin, download this toolch

19 Apr 18, 2022

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

103 Dec 23, 2022

Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

84 Dec 20, 2022

Set of models for classifcation of 3D volumes

Classification models 3D Zoo - Keras and TF.Keras This repository contains 3D variants of popular CNN models for classification like ResNets, DenseNet

69 Dec 28, 2022

Dynamic Slimmable Network (CVPR 2021, Oral)

Dynamic Slimmable Network (DS-Net) This repository contains PyTorch code of our paper: Dynamic Slimmable Network (CVPR 2021 Oral). Architecture of DS-

197 Dec 09, 2022

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

基于深度学习方法的单目多视图三维重建 Part I 三维重建代码：Part1 技术文档：[Markdown] [PDF] 原始图像：Original Images 点云结果：Point Cloud Results-1

19 Dec 26, 2022

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th

0 Jan 20, 2022

Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

NeuralPDE NeuralPDE.jl is a solver package which consists of neural network solvers for partial differential equations using scientific machine learni

680 Jan 02, 2023

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Awesome AutoDL A curated list of automated deep learning related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awe

2k Dec 30, 2022

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Related tags

Overview

Perceiver

Installation

Developing perceiver

A bit about Perceiver

Usage

About the notebooks

perceiver_example

Want to Contribute 🙋‍♂️ ?

Want to discuss? 💬

Citations

Comments

error with tf2.4.1

Training code

Create a FeedForward layer

Implement a PreNorm layer

Don't pin TensorFlow version to a specific number

Releases(v0.1.2)

v0.1.2(Apr 26, 2021)

v0.1.1(Apr 13, 2021)

v0.1.0(Apr 13, 2021)

Owner

Rishit Dagli

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Pytorch implementation of forward and inverse Haar Wavelets 2D

Combine Tacotron2 and Hifi GAN to generate speech from text

Toolchain to build Yoshi's Island from source code

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Evaluation and Benchmarking of Speech Super-resolution Methods

Set of models for classifcation of 3D volumes

Dynamic Slimmable Network (CVPR 2021, Oral)

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Physics-Informed Neural Networks (PINN) and Deep BSDE Solvers of Differential Equations for Scientific Machine Learning (SciML) accelerated simulation

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Deep learning algorithms for muon momentum estimation in the CMS Trigger System

Code for CVPR 2018 paper --- Texture Mapping for 3D Reconstruction with RGB-D Sensor

Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

Pytorch cuda extension of grid_sample1d

Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

RaceBERT -- A transformer based model to predict race and ethnicty from names

Neural Module Network for VQA in Pytorch

Developing `perceiver`

`perceiver_example`