PyGCL: A PyTorch Library for Graph Contrastive Learning

Overview

logo

PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standardized evaluation, and experiment management.

Made with Python PyPI version Documentation Status GitHub stars GitHub forks Total lines visitors


What is Graph Contrastive Learning?

Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. A typical GCL algorithm firstly constructs multiple graph views via stochastic augmentation of the input and then learns representations by contrasting positive samples against negative ones.

👉 For a general introduction of GCL, please refer to our paper and blog. Also, this repo tracks newly published GCL papers.

Install

Prerequisites

PyGCL needs the following packages to be installed beforehand:

  • Python 3.8+
  • PyTorch 1.9+
  • PyTorch-Geometric 1.7
  • DGL 0.7+
  • Scikit-learn 0.24+
  • Numpy
  • tqdm
  • NetworkX

Installation via PyPI

To install PyGCL with pip, simply run:

pip install PyGCL

Then, you can import GCL from your current environment.

A note regarding DGL

Currently the DGL team maintains two versions, dgl for CPU support and dgl-cu*** for CUDA support. Since pip treats them as different packages, it is hard for PyGCL to check for the version requirement of dgl. We have removed such dependency checks for dgl in our setup configuration and require the users to install a proper version by themselves.

Package Overview

Our PyGCL implements four main components of graph contrastive learning algorithms:

  • Graph augmentation: transforms input graphs into congruent graph views.
  • Contrasting architectures and modes: generate positive and negative pairs according to node and graph embeddings.
  • Contrastive objectives: computes the likelihood score for positive and negative pairs.
  • Negative mining strategies: improves the negative sample set by considering the relative similarity (the hardness) of negative sample.

We also implement utilities for training models, evaluating model performance, and managing experiments.

Implementations and Examples

For a quick start, please check out the examples folder. We currently implemented the following methods:

  • DGI (P. Veličković et al., Deep Graph Infomax, ICLR, 2019) [Example1, Example2]
  • InfoGraph (F.-Y. Sun et al., InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization, ICLR, 2020) [Example]
  • MVGRL (K. Hassani et al., Contrastive Multi-View Representation Learning on Graphs, ICML, 2020) [Example1, Example2]
  • GRACE (Y. Zhu et al., Deep Graph Contrastive Representation Learning, [email protected], 2020) [Example]
  • GraphCL (Y. You et al., Graph Contrastive Learning with Augmentations, NeurIPS, 2020) [Example]
  • SupCon (P. Khosla et al., Supervised Contrastive Learning, NeurIPS, 2020) [Example]
  • HardMixing (Y. Kalantidis et al., Hard Negative Mixing for Contrastive Learning, NeurIPS, 2020)
  • DCL (C.-Y. Chuang et al., Debiased Contrastive Learning, NeurIPS, 2020)
  • HCL (J. Robinson et al., Contrastive Learning with Hard Negative Samples, ICLR, 2021)
  • Ring (M. Wu et al., Conditional Negative Sampling for Contrastive Learning of Visual Representations, ICLR, 2021)
  • Exemplar (N. Zhao et al., What Makes Instance Discrimination Good for Transfer Learning?, ICLR, 2021)
  • BGRL (S. Thakoor et al., Bootstrapped Representation Learning on Graphs, arXiv, 2021) [Example1, Example2]
  • G-BT (P. Bielak et al., Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs, arXiv, 2021) [Example]
  • VICReg (A. Bardes et al., VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, arXiv, 2021)

Building Your Own GCL Algorithms

Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.

Graph Augmentation

In GCL.augmentors, PyGCL provides the Augmentor base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:

Augmentation Class name
Edge Adding (EA) EdgeAdding
Edge Removing (ER) EdgeRemoving
Feature Masking (FM) FeatureMasking
Feature Dropout (FD) FeatureDropout
Edge Attribute Masking (EAR) EdgeAttrMasking
Personalized PageRank (PPR) PPRDiffusion
Markov Diffusion Kernel (MDK) MarkovDiffusion
Node Dropping (ND) NodeDropping
Node Shuffling (NS) NodeShuffling
Subgraphs induced by Random Walks (RWS) RWSampling
Ego-net Sampling (ES) Identity

Call these augmentation functions by feeding with a Graph in a tuple form of node features, edge index, and edge features (x, edge_index, edge_attrs) will produce corresponding augmented graphs.

Composite Augmentations

PyGCL supports composing arbitrary numbers of augmentations together. To compose a list of augmentation instances augmentors, you need to use the Compose class:

import GCL.augmentors as A

aug = A.Compose([A.EdgeRemoving(pe=0.3), A.FeatureMasking(pf=0.3)])

You can also use the RandomChoice class to randomly draw a few augmentations each time:

import GCL.augmentors as A

aug = A.RandomChoice([A.RWSampling(num_seeds=1000, walk_length=10),
                      A.NodeDropping(pn=0.1),
                      A.FeatureMasking(pf=0.1),
                      A.EdgeRemoving(pe=0.1)],
                     num_choices=1)

Customizing Your Own Augmentation

You can write your own augmentation functions by inheriting the base Augmentor class and defining the augment function.

Contrasting Architectures and Modes

Existing GCL architectures could be grouped into two lines: negative-sample-based methods and negative-sample-free ones.

  • Negative-sample-based approaches can either have one single branch or two branches. In single-branch contrasting, we only need to construct one graph view and perform contrastive learning within this view. In dual-branch models, we generate two graph views and perform contrastive learning within and across views.
  • Negative-sample-free approaches eschew the need of explicit negative samples. Currently, PyGCL supports the bootstrap-style contrastive learning as well contrastive learning within embeddings (such as Barlow Twins and VICReg).
Contrastive architectures Supported contrastive modes Need negative samples Class name Examples
Single-branch contrasting G2L only SingleBranchContrast DGI, InfoGraph
Dual-branch contrasting L2L, G2G, and G2L DualBranchContrast GRACE
Bootstrapped contrasting L2L, G2G, and G2L BootstrapContrast BGRL
Within-embedding contrasting L2L and G2G WithinEmbedContrast GBT

Moreover, you can use add_extra_mask if you want to add positives or remove negatives. This function performs bitwise ADD to extra positive masks specified by extra_pos_mask and bitwise OR to extra negative masks specified by extra_neg_mask. It is helpful, for example, when you have supervision signals from labels and want to train the model in a semi-supervised manner.

Internally, PyGCL calls Sampler classes in GCL.models that receive embeddings and produce positive/negative masks. PyGCL implements three contrasting modes: (a) Local-Local (L2L), (b) Global-Global (G2G), and (c) Global-Local (G2L) modes. L2L and G2G modes contrast embeddings at the same scale and the latter G2L one performs cross-scale contrasting. To implement your own GCL model, you may also use these provided sampler models:

Contrastive modes Class name
Same-scale contrasting (L2L and G2G) SameScaleSampler
Cross-scale contrasting (G2L) CrossScaleSampler
  • For L2L and G2G, embedding pairs of the same node/graph in different views constitute positive pairs. You can refer to GRACE and GraphCL for examples.
  • For G2L, node-graph embedding pairs form positives. Note that for single-graph datasets, the G2L mode requires explicit negative sampling (otherwise no negatives for contrasting). You can refer to DGI for an example.
  • Some models (e.g., GRACE) add extra intra-view negative samples. You may manually call sampler.add_intraview_negs to enlarge the negative sample set.
  • Note that the bootstrapping latent model involves some special model design (asymmetric online/offline encoders and momentum weight updates). You may refer to BGRL for details.

Contrastive Objectives

In GCL.losses, PyGCL implements the following contrastive objectives:

Contrastive objectives Class name
InfoNCE loss InfoNCE
Jensen-Shannon Divergence (JSD) loss JSD
Triplet Margin (TM) loss Triplet
Bootstrapping Latent (BL) loss BootstrapLatent
Barlow Twins (BT) loss BarlowTwins
VICReg loss VICReg

All these objectives are able to contrast any arbitrary positive and negative pairs, except for Barlow Twins and VICReg losses that perform contrastive learning within embeddings. Moreover, for InfoNCE and Triplet losses, we further provide SP variants that computes contrastive objectives given only one positive pair per sample to speed up computation and avoid excessive memory consumption.

Negative Sampling Strategies

PyGCL further implements several negative sampling strategies:

Negative sampling strategies Class name
Subsampling GCL.models.SubSampler
Hard negative mixing GCL.models.HardMixing
Conditional negative sampling GCL.models.Ring
Debiased contrastive objective GCL.losses.DebiasedInfoNCE , GCL.losses.DebiasedJSD
Hardness-biased negative sampling GCL.losses.HardnessInfoNCE, GCL.losses.HardnessJSD

The former three models serve as an additional sampling step similar to existing Sampler ones and can be used in conjunction with any objectives. The last two objectives are only for InfoNCE and JSD losses.

Utilities

PyGCL provides a variety of evaluator functions to evaluate the embedding quality:

Evaluator Class name
Logistic regression LREvaluator
Support vector machine SVMEvaluator
Random forest RFEvaluator

To use these evaluators, you first need to generate dataset splits by get_split (random split) or by from_predefined_split (according to preset splits).

Contribution

Feel free to open an issue should you find anything unexpected or create pull requests to add your own work! We are motivated to continuously make PyGCL even better.

Citation

Please cite our paper if you use this code in your own work:

@article{Zhu:2021tu,
author = {Zhu, Yanqiao and Xu, Yichen and Liu, Qiang and Wu, Shu},
title = {{An Empirical Study of Graph Contrastive Learning}},
journal = {arXiv.org},
year = {2021},
eprint = {2109.01116v1},
eprinttype = {arxiv},
eprintclass = {cs.LG},
month = sep,
}
Owner
PyGCL
A PyTorch Library for Graph Contrastive Learning
PyGCL
StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system

StableSims is an open-source project aimed at simulating MakerDAO's Dai stablecoin system, initially used for researching optimal incentive parameters for Liquidations 2.0.

Blockchain at Berkeley 52 Nov 21, 2022
DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

DCSL Generalizable Crowd Counting via Diverse Context Style Learning Requirement

3 Jun 13, 2022
3D Human Pose Machines with Self-supervised Learning

3D Human Pose Machines with Self-supervised Learning Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, and Pengxu Wei, “3D Human Pose Machines with Self

Chenhan Jiang 398 Dec 20, 2022
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
Time Dependent DFT in Tamm-Dancoff Approximation

Density Function Theory Program - kspy-tddft(tda) This is an implementation of Time-Dependent Density Functional Theory(TDDFT) using the Tamm-Dancoff

Peter Borthwick 2 Nov 17, 2022
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch 안녕하세요. 저는 TUNiB에서 머신러닝 엔지니어로 근무 중인 고현웅입니다. 이 자료는 대규모 언어모델 개발에 필요한 여러가지 기술들을 소개드리기 위해 마련하였으며 기본적으로

TUNiB 172 Dec 29, 2022
Depth image based mouse cursor visual haptic

Depth image based mouse cursor visual haptic How to run it. Install pyqt5. Install python modules pip install Pillow pip install numpy For illustrati

Xiong Jie 17 Dec 20, 2022
[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.

Social-NCE + CrowdNav Website | Paper | Video | Social NCE + Trajectron | Social NCE + STGCNN This is an official implementation for Social NCE: Contr

VITA lab at EPFL 125 Dec 23, 2022
Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

YOLOv5-GUI 🎉 YOLOv5算法(ver.6及ver.5)的Qt-GUI实现 🎉 Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5). 基于YOLOv5的v5版本和v6版本及Javacr大佬的UI逻辑进行编写

EricFang 12 Dec 28, 2022
Implementing DeepMind's Fast Reinforcement Learning paper

Fast Reinforcement Learning This is a repo where I implement the algorithms in the paper, Fast reinforcement learning with generalized policy updates.

Marcus Chiam 6 Nov 28, 2022
PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

Hugo Flores García 130 Dec 14, 2022
Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided curriculum Learning Approach

Get Fooled for the Right Reason Official repository for the NeurIPS 2021 paper Get Fooled for the Right Reason: Improving Adversarial Robustness throu

Sowrya Gali 1 Apr 25, 2022
Codebase of deep learning models for inferring stability of mRNA molecules

Kaggle OpenVaccine Models Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challen

Eternagame 40 Dec 29, 2022
Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

Pose Detection Project Description: Human pose estimation from video plays a critical role in various applications such as quantifying physical exerci

Hassan Shahzad 2 Jan 17, 2022
Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.

DFN:Distributed Feedback Network for Single-Image Deraining Abstract Recently, deep convolutional neural networks have achieved great success for sing

6 Nov 05, 2022
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. Download $ git clone http

26 Dec 13, 2022
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Understanding the Generalization Benefit of Model Invariance from a Data Perspective This is the code for our NeurIPS2021 paper "Understanding the Gen

1 Jan 15, 2022
Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

TFLite-HITNET-Stereo-depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite. Stereo depth e

Ibai Gorordo 22 Oct 20, 2022