Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Last update: Dec 28, 2022

Overview

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Overview | Abstract | Installation | Examples | Citation

Overview

Hi, good to see you here! 👋

Thanks for checking out the code for Non-Parametric Transformers (NPTs).

This codebase will allow you to reproduce experiments from the paper as well as use NPTs for your own research.

Abstract

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

Installation

Set up and activate the Python environment by executing

conda env create -f environment.yml
conda activate npt

For now, we recommend installing CUDA <= 10.2:

See issue with CUDA >= 11.0 here.

If you are running this on a system without a GPU, use the above with environment_no_gpu.yml instead.

Examples

We now give some basic examples of running NPT.

NPT downloads all supported datasets automatically, so you don't need to worry about that.

We use wandb to log experimental results. Wandb allows us to conveniently track run progress online. If you do not want wandb enabled, you can run wandb off in the shell where you execute NPT.

For example, run this to explore NPT with default configuration on Breast Cancer

python run.py --data_set breast-cancer

Another example: A run on the poker-hand dataset may look like this

python run.py --data_set poker-hand \
--exp_batch_size 4096 \
--exp_print_every_nth_forward 100

You can find all possible config arguments and descriptions in NPT/configs.py or using python run.py --help.

In scripts/ we provide a list with the runs and correct hyperparameter configurations presented in the paper.

We hope you enjoy using the code and please feel free to reach out with any questions 😊

Citation

If you find this code helpful for your work, please cite our paper Paper as

@article{kossen2021self,
  title={Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning},
  author={Kossen, Jannik and Band, Neil and Gomez, Aidan N. and Lyle, Clare and Rainforth, Tom and Gal, Yarin},
  journal={arXiv:2106.02584},
  year={2021}
}

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Related tags

Overview

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Overview

Abstract

Installation

Examples

Citation

Owner

OATML

Pywonderland - A tour in the wonderland of math with python.

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

Clean Machine Learning, a Coding Kata

Scalable Multi-Agent Reinforcement Learning

So-ViT: Mind Visual Tokens for Vision Transformer

TinyML Cookbook, published by Packt

验证码识别深度学习 tensorflow 神经网络

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

Scene-Text-Detection-and-Recognition (Pytorch)

GND-Nets (Graph Neural Diffusion Networks) in TensorFlow.

Code and datasets for TPAMI 2021

ArcaneGAN by Alex Spirin

My freqtrade strategies

Python implementation of Bayesian optimization over permutation spaces.

Synthetic Scene Text from 3D Engines

Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

An updated version of virtual model making

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Related tags

Overview

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Overview

Abstract

Installation

Examples

Citation

Owner

OATML

Pywonderland - A tour in the wonderland of math with python.

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

Clean Machine Learning, a Coding Kata

Scalable Multi-Agent Reinforcement Learning

So-ViT: Mind Visual Tokens for Vision Transformer

TinyML Cookbook, published by Packt

验证码识别 深度学习 tensorflow 神经网络

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

Scene-Text-Detection-and-Recognition (Pytorch)

GND-Nets (Graph Neural Diffusion Networks) in TensorFlow.

Code and datasets for TPAMI 2021

ArcaneGAN by Alex Spirin

My freqtrade strategies

Python implementation of Bayesian optimization over permutation spaces.

Synthetic Scene Text from 3D Engines

Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

An updated version of virtual model making

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

验证码识别深度学习 tensorflow 神经网络