Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Last update: Jan 08, 2023

Overview

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

This repository contains a TensorFlow implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" by Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh (accepted as ORAL presentation in ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2019).

Paper link: https://arxiv.org/pdf/1905.07953.pdf

Requirements

install clustering toolkit: metis and its Python interface.

download and install metis: http://glaros.dtc.umn.edu/gkhome/metis/metis/download

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering (official website)

1) Download metis-5.1.0.tar.gz from http://glaros.dtc.umn.edu/gkhome/metis/metis/download and unpack it
2) cd metis-5.1.0
3) make config shared=1 prefix=~/.local/
4) make install
5) export METIS_DLL=~/.local/lib/libmetis.so

install required Python packages

 pip install -r requirements.txt

quick test to see whether you install metis correctly:

>>> import networkx as nx
>>> import metis
>>> G = metis.example_networkx()
>>> (edgecuts, parts) = metis.part_graph(G, 3)

We follow GraphSAGE's input format and its code for pre-processing the data.
This repository includes scripts for reproducing our experimental results on PPI and Reddit. Both datasets can be downloaded from this website.

Run Experiments.

After metis and networkx are set up, and datasets are ready, we can try the scripts.
We assume data files are stored under './data/{data-name}/' directory.

For example, the path of PPI data files should be: data/ppi/ppi-{G.json, feats.npy, class_map.json, id_map.json}
For PPI data, you may run the following scripts to reproduce results in our paper

./run_ppi.sh

For reference, with a V100 GPU, running time per epoch on PPI is about 1 second.

The test F1 score will be around 0.9935 depending on different initialization.

For reddit data (need change the data_prefix path in .sh to point to the data):

./run_reddit.sh

In the experiment section of the paper, we show how to generate Amazon2M dataset. There is an external implementation for generating Amazon2M data following the same procedure in the paper (code and data).

Below shows a table of state-of-the-art performance from recent papers.

	PPI	Reddit
FastGCN (code)	N/A	93.7
GraphSAGE (code)	61.2	95.4
VR-GCN (code)	97.8	96.3
GAT (code)	97.3	N/A
GaAN	98.71	96.36
GeniePath	98.5	N/A
Cluster-GCN	99.36	96.60

If you use any of the materials, please cite the following paper.

@inproceedings{clustergcn,
  title = {Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks},
  author = { Wei-Lin Chiang and Xuanqing Liu and Si Si and Yang Li and Samy Bengio and Cho-Jui Hsieh},
  booktitle = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)},
  year = {2019},
  url = {https://arxiv.org/pdf/1905.07953.pdf},
}

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Related tags

Overview

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Requirements

Run Experiments.

Owner

Jingwei Zheng

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Least Square Calibration for Peer Reviews

Athena is the only tool that you will ever need to optimize your portfolio.

Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Like a cowsay but without cows!

RL and distillation in CARLA using a factorized world model

A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Pure python implementation reverse-mode automatic differentiation

FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)"

ADOP: Approximate Differentiable One-Pixel Point Rendering

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"