Graph neural network message passing reframed as a Transformer with local attention

Last update: Dec 28, 2022

Overview

Adjacent Attention Network

An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with multi-head attention at each successive layer. Since Graph Attention Network is already taken, I decided to name it Adjacent Attention Network instead. The design will be more transformer-centric. Instead of using the square root inverse adjacency matrix trick by Kipf and Welling, in this framework it will simply be translated to the proper attention mask at each layer.

This repository is for my own exploration into the graph neural network field. My gut tells me the transformers architecture can generalize and outperform graph neural networks.

Install

$ pip install adjacent-attention-network

Usage

Basically a transformers where each node pays attention to the neighbors as defined by the adjacency matrix. Complexity is O(n * max_neighbors). Max number of neighbors as defined by the adjacency matrix.

The following example will have a complexity of ~ 1024 * 100

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4
)

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1) < 0.1
nodes   = torch.randn(1, 1024, 512)
mask    = torch.ones(1, 1024).bool()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

If the number of neighbors contain outliers, then the above will lead to wasteful computation, since a lot of nodes will be doing attention on padding. You can use the following stop-gap measure to account for these outliers.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_neighbors_cutoff = 100
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

# for some reason, one of the nodes is fully connected to all others
adj_mat[:, 0] = 1.

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

For non-local attention, I've decided to use a trick from the Set Transformers paper, the Induced Set Attention Block (ISAB). From the lens of graph neural net literature, this would be analogous as having global nodes for message passing non-locally.

import torch
from adjacent_attention_network import AdjacentAttentionNetwork

model = AdjacentAttentionNetwork(
    dim = 512,
    depth = 6,
    heads = 4,
    num_global_nodes = 5
).cuda()

adj_mat = torch.empty(1, 1024, 1024).uniform_(0, 1).cuda() < 0.1
nodes   = torch.randn(1, 1024, 512).cuda()
mask    = torch.ones(1, 1024).bool().cuda()

model(nodes, adj_mat, mask = mask) # (1, 1024, 512)

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

519 Jan 2, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

71 Dec 30, 2022

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

25 Dec 28, 2022

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

21 Dec 30, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

7 Nov 7, 2022

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

8 Dec 30, 2022

Graph neural network message passing reframed as a Transformer with local attention

Related tags

Overview

Adjacent Attention Network

Install

Usage

You might also like...

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Releases(0.0.12)

0.0.12(Dec 24, 2022)

0.0.11(Dec 14, 2020)

0.0.10(Dec 14, 2020)

0.0.9(Dec 14, 2020)

0.0.8(Dec 14, 2020)

0.0.7(Dec 14, 2020)

0.0.6(Dec 14, 2020)

0.0.5(Dec 14, 2020)

0.0.4(Dec 14, 2020)

0.0.3(Dec 14, 2020)

0.0.2(Dec 14, 2020)

0.0.1(Dec 14, 2020)

Owner

Phil Wang

A machine learning project which can detect and predict the skin disease through image recognition.

Husein pet projects in here!

Tree LSTM implementation in PyTorch

DCGAN LSGAN WGAN-GP DRAGAN PyTorch

PaSST: Efficient Training of Audio Transformers with Patchout

Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

Semi-supervised semantic segmentation needs strong, varied perturbations

Geometric Algebra package for JAX

BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

Self-Supervised Learning with Kernel Dependence Maximization

Use unsupervised and supervised learning to predict stocks

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Automatically erase objects in the video, such as logo, text, etc.

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

The MLOps platform for innovators 🚀