[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

Overview

New Benchmarks for Learning on Non-Homophilous Graphs

Here are the codes and datasets accompanying the paper:
New Benchmarks for Learning on Non-Homophilous Graphs
Derek Lim (Cornell), Xiuyu Li (Cornell), Felix Hohne (Cornell), and Ser-Nam Lim (Facebook AI).
Workshop on Graph Learning Benchmarks, WWW 2021.
[PDF link]

There are codes to load our proposed datasets, compute our measure of the presence of homophily, and train various graph machine learning models in our experimental setup.

Organization

main.py contains the main experimental scripts.

dataset.py loads our datasets.

models.py contains implementations for graph machine learning models, though C&S (correct_smooth.py, cs_tune_hparams.py) is in separate files. Also, gcn-ogbn-proteins.py contains code for running GCN and GCN+JK on ogbn-proteins. Running several of the GNN models on larger datasets may require at least 24GB of VRAM.

homophily.py contains functions for computing homophily measures, including the one that we introduce in our_measure.

Datasets

Alt text

As discussed in the paper, our proposed datasets are "twitch-e", "yelp-chi", "deezer", "fb100", "pokec", "ogbn-proteins", "arxiv-year", and "snap-patents", which can be loaded by load_nc_dataset in dataset.py by passing in their respective string name. Many of these datasets are included in the data/ directory, but due to their size, yelp-chi, snap-patents, and pokec are automatically downloaded from a Google drive link when loaded from dataset.py. The arxiv-year and ogbn-proteins datasets are downloaded using OGB downloaders. load_nc_dataset returns an NCDataset, the documentation for which is also provided in dataset.py. It is functionally equivalent to OGB's Library-Agnostic Loader for Node Property Prediction, except for the fact that it returns torch tensors. See the OGB website for more specific documentation. Just like the OGB function, dataset.get_idx_split() returns fixed dataset split for training, validation, and testing.

When there are multiple graphs (as in the case of twitch-e and fb100), different ones can be loaded by passing in the sub_dataname argument to load_nc_dataset in dataset.py.

twitch-e consists of seven graphs ["DE", "ENGB", "ES", "FR", "PTBR", "RU", "TW"]. In the paper we test on DE.

fb100 consists of 100 graphs. We only include ["Amherst41", "Cornell5", "Johns Hopkins55", "Penn94", "Reed98"] in this repo, although others may be downloaded from the internet archive. In the paper we test on Penn94.

Alt text

Installation instructions

  1. Create and activate a new conda environment using python=3.8 (i.e. conda create --name non-hom python=3.8)
  2. Activate your conda environment
  3. Check CUDA version using nvidia-smi
  4. In the root directory of this repository, run bash install.sh cu110, replacing cu110 with your CUDA version (i.e. CUDA 11 -> cu110, CUDA 10.2 -> cu102, CUDA 10.1 -> cu101). We tested on Ubuntu 18.04, CUDA 11.0.

Running experiments

  1. Make sure a results folder exists in the root directory.
  2. Our experiments are in the experiments/ directory. There are bash scripts for running methods on single and multiple datasets. Please note that the experiments must be run from the root directory. For instance, to run the MixHop experiments on snap-patents, use:
bash experiments/mixhop_exp.sh snap-patents

Some datasets require specifying a second sub_dataset argument e.g. to run MixHop experiments on the twitch-e, DE sub_dataset, do:

bash experiments/mixhop_exp.sh twitch-e DE

Otherwise, run python main.py --help to see the full list of options for running experiments. As one example, to train a GAT with max jumping knowledge connections on (directed) arxiv-year with 32 hidden channels and 4 attention heads, run:

python main.py --dataset arxiv-year --method gatjk --hidden_channels 32 --gat_heads 4 --directed
Owner
Cornell University Artificial Intelligence
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
πŸ€— The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

πŸ€— The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 02, 2023
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022
Multiple implementations for abstractive text summurization , using google colab

Text Summarization models if you are able to endorse me on Arxiv, i would be more than glad https://arxiv.org/auth/endorse?x=FRBB89 thanks This repo i

463 Dec 26, 2022
A simple version of DeTR

DeTR-Lite A simple version of DeTR Before you enjoy this DeTR-Lite The purpose of this project is to allow you to learn the basic knowledge of DeTR. P

Jianhua Yang 11 Jun 13, 2022
Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.Specifica

Meta Research 193 Dec 28, 2022
Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and

14 Dec 15, 2021
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

Abel 211 Dec 28, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Python library to make development of portfolio analysis faster and easier

Trafalgar Python library to make development of portfolio analysis faster and easier Installation πŸ”₯ For the moment, Trafalgar is still in beta develo

Santosh Passoubady 641 Jan 01, 2023
πŸ¦† Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
An open source framework for seq2seq models in PyTorch.

pytorch-seq2seq Documentation This is a framework for sequence-to-sequence (seq2seq) models implemented in PyTorch. The framework has modularized and

International Business Machines 1.4k Jan 02, 2023
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Official Stanford NLP Python Library for Many Human Languages

Official Stanford NLP Python Library for Many Human Languages

Stanford NLP 6.4k Jan 02, 2023
NL-Augmenter 🦎 β†’ 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 β†’ 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
pyupbit 라이브러리λ₯Ό ν™œμš©ν•˜μ—¬ upbitμ—μ„œ λΉ„νŠΈμ½”μΈμ„ μžλ™λ§€λ§€ν•˜λŠ” μ½”λ“œμž…λ‹ˆλ‹€. μ‘°μ½”λ”© 유튜브 μ±„λ„μ—μ„œ μžμ„Έν•œ κ°•μ˜ μ˜μƒμ„ 보싀 수 μžˆμŠ΅λ‹ˆλ‹€.

파이썬 λΉ„νŠΈμ½”μΈ 투자 μžλ™ν™” κ°•μ˜ μ½”λ“œ by 유튜브 μ‘°μ½”λ”© 채널 pyupbit 라이브러리λ₯Ό ν™œμš©ν•˜μ—¬ upbit κ±°λž˜μ†Œμ—μ„œ λΉ„νŠΈμ½”μΈ μžλ™λ§€λ§€λ₯Ό ν•˜λŠ” μ½”λ“œμž…λ‹ˆλ‹€. 파일 ꡬ성 test.py : μž”κ³  쑰회 (1κ°•) backtest.py : λ°±ν…ŒμŠ€νŒ… μ½”λ“œ (2κ°•) bestK.p

μ‘°μ½”λ”© JoCoding 186 Dec 29, 2022
Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

Sparki 1 Jun 11, 2022