A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

Last update: Jun 21, 2022

Overview

SelfGNN

A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling" paper, which will appear in The International Workshop on Self-Supervised Learning for the Web (SSL'21) @ the Web Conference 2021 (WWW'21).

Note

This is an ongoing work and the repository is subjected to continuous updates.

Requirements!

Python 3.6+
PyTorch 1.6+
PyTorch Geometric 1.6+
Numpy 1.17.2+
Networkx 2.3+
SciPy 1.5.4+
(OPTINAL) OPTUNA 2.8.0+ If you wish to tune the hyper-parameters of SelfGNN for any dataset

Example usage

$ python src/train.py

💥 Updates

Update 3

Added a hyper-parameter tuning utility using OPTUNA.

usage:

$ python src/tune.py

Update 2

Contrary to what we've claimed in the paper, studies argue and empirically show that Batch Norm does not introduce implicit negative samples. Instead, mainly it compensate for improper initialization. We have carried out new and similar experiments, as shown in the table below, that seems to confirm this argument. (BN:Batch Norm, LN:Layer Norm, -: No Norm ). For this experiment we use a GCN encoder and split data-augmentation. Though BN does not provide implicit negative samples, the empirical evaluation shows that it leads to a better performance; putting it in the encoder is almost sufficient. LN on the other hand is not cosistent; furthemore, the model tends to prefer having BN than LN in any of the modules.

Module			Dataset
Encoder	Projector	Predictor	Photo	Computer	Pubmed
BN	BN	BN	94.05±0.23	88.83±0.17	77.76±0.57
	BN	-	94.2±0.17	88.78±0.20	75.48±0.70
	-	BN	94.01±0.20	88.65±0.16	78.66±0.52
	-	-	93.9±0.18	88.82±0.16	78.53±0.47
LN	LN	LN	81.42±2.43	64.10±3.29	74.06±1.07
	LN	-	84.1±1.58	68.18±3.21	74.26±0.55
	-	LN	92.39±0.38	77.18±1.23	73.84±0.73
	-	-	91.93±0.40	73.90±1.16	74.11±0.73
-	BN	BN	90.01±0.09	77.83±0.12	79.21±0.27
	BN	-	90.12±0.07	76.43±0.08	75.10±0.15
	LN	LN	45.34±2.47	40.56±1.48	56.29±0.77
	LN	-	52.92±3.37	40.23±1.46	60.76±0.81
-	-	BN	91.13±0.13	81.79±0.11	79.34±0.21
		LN	50.64±2.84	47.62±2.27	64.18±1.08
		-	50.35±2.73	43.68±1.80	63.91±0.92

Update 1

Both the paper and the source code are updated following the discussion on this issue
Ablation study on the impact of BatchNorm added following reviewers feedback from SSL'21
- The findings show that SelfGNN with out batch normalization is not stable and often its performance drops significantly
- Layer Normalization behaves similar to the finding of no BatchNorm

Possible options for training SelfGNN

The following options can be passed to src/train.py

--root: or -r: A path to a root directory to put all the datasets. Default is ./data

--name: or -n: The name of the datasets. Default is cora. Check the Supported dataset names

--model: or -m: The type of GNN architecture to use. Curently three architectres are supported (gcn, gat, sage). Default is gcn.

--aug: or -a: The name of the data augmentation technique. Curently (ppr, heat, katz, split, zscore, ldp, paste) are supported. Default is split.

--layers: or -l: One or more integer values specifying the number of units for each GNN layer. Default is 512 128

--norms: or -nm: The normalization scheme for each module. Default is batch. That is, a Batch Norm will be used in the prediction head. Specifying two inputs, e.g. --norms batch layer, allows the model to use batch norm in the GNN encoder, and layer norm in the prediction head. Finally, specifying three inputs, e.g., --norms no batch layer activates the projection head and normalization is used as: No norm for GNN encoder, Batch Norm for projection head and Layer Norm for the prediction head.

--heads: or -hd: One or more values specifying the number of heads for each GAT layer. Applicable for --model gat. Default is 8 1

--lr: or -lr: Learning rate, a value in [0, 1]. Default is 0.0001

--dropout: or -do: Dropout rate, a value in [0, 1]. Deafult is 0.2

--epochs: or -e: The number of epochs. Default is 1000.

--cache-step: or -cs: The step size for caching the model. That is, every --cache-step the model will be persisted. Default is 100.

--init-parts: or -ip: The number of initial partitions, for using the improved version using Clustering. Default is 1.

--final-parts: or -fp: The number of final partitions, for using the improved version using Clustering. Default is 1.

Supported dataset names

Name	Nodes	Edges	Features	Classes	Description
`Cora`	2,708	5,278	1,433	7	Citation Network
`Citeseer`	3,327	4,552	3,703	6	Citation Network
`Pubmed`	19,717	44,324	500	3	Citation Network
`Photo`	7,487	119,043	745	8	Co-purchased products network
`Computers`	13,381	245,778	767	10	Co-purchased products network
`CS`	18,333	81,894	6,805	15	Collaboration network
`Physics`	34,493	247,962	8,415	5	Collaboration network

Any dataset from the PyTorch Geometric library can be used, however SelfGNN is tested only on the above datasets.

Citing

If you find this research helpful, please cite it as

@misc{kefato2021selfsupervised,
      title={Self-supervised Graph Neural Networks without explicit negative sampling}, 
      author={Zekarias T. Kefato and Sarunas Girdzijauskas},
      year={2021},
      eprint={2103.14958},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

Related tags

Overview

SelfGNN

Note

Requirements!

Example usage

💥 Updates

Update 3

Update 2

Update 1

Possible options for training SelfGNN

Supported dataset names

Citing

Owner

Zekarias Tilahun

DeRF: Decomposed Radiance Fields

Self-attentive task GAN for space domain awareness data augmentation.

TensorFlow implementation of "Variational Inference with Normalizing Flows"

Official code repository for ICCV 2021 paper: Gravity-Aware Monocular 3D Human Object Reconstruction

Invasive Plant Species Identification

MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

Machine Learning toolbox for Humans

PyTorch implementation of probabilistic deep forecast applied to air quality.

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

FairMOT - A simple baseline for one-shot multi-object tracking

Score refinement for confidence-based 3D multi-object tracking

PyMatting: A Python Library for Alpha Matting

Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

StyleGAN2 Webtoon / Anime Style Toonify

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection

A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

Related tags

Overview

SelfGNN

Note

Requirements!

Example usage

💥 Updates

Update 3

Update 2

Update 1

Possible options for training SelfGNN

Supported dataset names

Citing

Owner

Zekarias Tilahun

DeRF: Decomposed Radiance Fields

Self-attentive task GAN for space domain awareness data augmentation.

TensorFlow implementation of "Variational Inference with Normalizing Flows"

Official code repository for ICCV 2021 paper: Gravity-Aware Monocular 3D Human Object Reconstruction

Invasive Plant Species Identification

MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

Machine Learning toolbox for Humans

PyTorch implementation of probabilistic deep forecast applied to air quality.

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

FairMOT - A simple baseline for one-shot multi-object tracking

Score refinement for confidence-based 3D multi-object tracking

PyMatting: A Python Library for Alpha Matting

Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

StyleGAN2 Webtoon / Anime Style Toonify

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.