Scalable Graph Neural Networks for Heterogeneous Graphs

Related tags

Deep LearningNARS
Overview

Neighbor Averaging over Relation Subgraphs (NARS)

NARS is an algorithm for node classification on heterogeneous graphs, based on scalable neighbor averaging techniques that have been previously used in e.g. SIGN to heterogeneous scenarios by generating neighbor-averaged features on sampled relation induced subgraphs.

For more details, please check out our paper:

Scalable Graph Neural Networks for Heterogeneous Graphs

Setup

Dependencies

  • torch==1.5.1+cu101
  • dgl-cu101==0.4.3.post2
  • ogb==1.2.1
  • dglke==0.1.0

Docker

We have prepared a dockerfile for building a container with clean environment and all required dependencies. Please checkout instructions in docker.

Data Preparation

Download and pre-process OAG dataset (optional)

If you plan to evaluate on OAG dataset, you need to follow instructions in oag_dataset to download and pre-process dataset.

Generate input for featureless node types

In academic graph datasets (ACM, MAG, OAG) in which only paper nodes are associated with input features. NARS featurizes other node types with TransE relational graph embedding pre-trained on the graph structure.

Please follow instructions in graph_embed to generate embeddings for each dataset.

Sample relation subsets

NARS samples Relation Subsets (see our paper for details). Please follow the instructions in sample_relation_subsets to generate these subsets.

Or you may skip this step and use the example subsets that have added to this repository.

Run NARS Experiments

NARS are evaluated on three academic graph datasets to predict publishing venues and fields of papers.

ACM

python3 train.py --dataset acm --use-emb TransE_acm --R 2 \
    --use-relation-subsets sample_relation_subsets/examples/acm \
    --num-hidden 64 --lr 0.003 --dropout 0.7 --eval-every 1 \
    --num-epochs 100 --input-dropout

OGBN-MAG

python3 train.py --dataset mag --use-emb TransE_mag --R 5 \
    --use-relation-subset sample_relation_subsets/examples/mag \
    --eval-batch-size 50000 --num-hidden 512 --lr 0.001 --batch-s 50000 \
    --dropout 0.5 --num-epochs 1000

OAG (venue prediction)

python3 train.py --dataset oag_venue --use-emb TransE_oag_venue --R 3 \
    --use-relation-subsets sample_relation_subsets/examples/oag_venue \
    --eval-batch-size 25000 --num-hidden 256 --lr 0.001 --batch-size 1000 \
    --data-dir oag_dataset --dropout 0.5 --num-epochs 200

OAG (L1-field prediction)

python3 train.py --dataset oag_L1 --use-emb TransE_oag_L1 --R 3 \
    --use-relation-subsets sample_relation_subsets/examples/oag_L1 \
    --eval-batch-size 25000 --num-hidden 256 --lr 0.001 --batch-size 1000 \
    --data-dir oag_dataset --dropout 0.5 --num-epochs 200

Results

Here is a summary of model performance using example relation subsets:

For ACM and OGBN-MAG dataset, the task is to predict paper publishing venue.

Dataset # Params Test Accuracy
ACM 0.40M 0.9305±0.0043
OGBN-MAG 4.13M 0.5240±0.0016

For OAG dataset, there are two different node predictions tasks: predicting venue (single-label) and L1-field (multi-label). And we follow Heterogeneous Graph Transformer to evaluate using NDCG and MRR metrics.

Task # Params NDCG MRR
Venue 2.24M 0.5214±0.0010 0.3434±0.0012
L1-field 1.41M 0.86420.0022 0.8542±0.0019

Run with limited GPU memory

The above commands were tested on Tesla V100 (32 GB) and Tesla T4 (15GB). If your GPU memory isn't enough for handling large graphs, try the following:

  • add --cpu-process to the command to move preprocessing logic to CPU
  • reduce evaluation batch size with --eval-batch-size. The evaluation result won't be affected since model is fixed.
  • reduce training batch with --batch-size

Run NARS with Reduced CPU Memory Footprint

As mentioned in our paper, using a lot of relation subsets may consume too much CPU memory. To reduce CPU memory footprint, we implemented an optimization in train_partial.py which trains part of our feature aggregation weights at a time.

Using OGBN-MAG dataset as an example, the following command randomly picks 3 subsets from all 8 sampled relation subsets and trains their aggregation weights every 10 epochs.

python3 train_partial.py --dataset mag --use-emb TransE_mag --R 5 \
    --use-relation-subsets sample_relation_subsets/examples/mag \
    --eval-batch-size 50000 --num-hidden 512 --lr 0.001 --batch-size 50000 \
    --dropout 0.5 --num-epochs 1000 --sample-size 3 --resample-every 10

Citation

Please cite our paper with:

@article{yu2020scalable,
    title={Scalable Graph Neural Networks for Heterogeneous Graphs},
    author={Yu, Lingfan and Shen, Jiajun and Li, Jinyang and Lerer, Adam},
    journal={arXiv preprint arXiv:2011.09679},
    year={2020}
}

License

NARS is CC-by-NC licensed, as found in the LICENSE file.

Owner
Facebook Research
Facebook Research
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 159 Dec 20, 2022
Exemplo de implementação do padrão circuit breaker em python

fast-circuit-breaker Circuit breakers existem para permitir que uma parte do seu sistema falhe sem destruir todo seu ecossistema de serviços. Michael

James G Silva 17 Nov 10, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop

Jiahao Ma 20 Dec 21, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 335 Jan 06, 2023
Real life contra a deep learning project built using mediapipe and openc

real-life-contra Description A python script that translates the body movement into in game control. Welcome to all new real life contra a deep learni

Programminghut 7 Jan 26, 2022
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

The repo provides the code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2

Yuning Mao 18 May 24, 2022
CAR-API: Cityscapes Attributes Recognition API

CAR-API: Cityscapes Attributes Recognition API This is the official api to download and fetch attributes annotations for Cityscapes Dataset. Content I

Kareem Metwaly 5 Dec 22, 2022
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend This project acts as both a tuto

Guillaume Chevalier 103 Jul 22, 2022
Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

Geometry-Aware Learning of Maps for Camera Localization This is the PyTorch implementation of our CVPR 2018 paper "Geometry-Aware Learning of Maps for

NVIDIA Research Projects 321 Nov 26, 2022
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

FOSSASIA 9.4k Jan 07, 2023
TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
[ArXiv 2021] One-Shot Generative Domain Adaptation

GenDA - One-Shot Generative Domain Adaptation One-Shot Generative Domain Adaptation Ceyuan Yang*, Yujun Shen*, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Z

GenForce: May Generative Force Be with You 46 Dec 19, 2022
AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-l

Facebook Research 4.6k Jan 09, 2023
Breaking the Dilemma of Medical Image-to-image Translation

Breaking the Dilemma of Medical Image-to-image Translation Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field

Kid Liet 86 Dec 21, 2022
Medical Insurance Cost Prediction using Machine earning

Medical-Insurance-Cost-Prediction-using-Machine-learning - Here in this project, I will use regression analysis to predict medical insurance cost for people in different regions, and based on several

1 Dec 27, 2021
ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation This repository provides a PyTorch implementation of ADSPM. Requirements Pyth

24 Jul 24, 2022
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

NÜWA - Pytorch (wip) Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be popul

Phil Wang 463 Dec 28, 2022
A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Hands-on-Machine-Learning 目的 这份笔记旨在帮助中文学习者以一种较快较系统的方式入门机器学习, 是在学习Hands-on Machine Learning with Scikit-Learn and TensorFlow这本书的 时候做的个人笔记: 此项目的可取之处 原书的

Baymax 1.5k Dec 21, 2022
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022