A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

Last update: Dec 01, 2022

Overview

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR-DPML and MLSys21 - GNNSys'21 workshops.

Datasets: http://moleculenet.ai/

Installation

After git clone-ing this repository, please run the following command to install our dependencies.

conda create -n fedgraphnn python=3.7
conda activate fedgraphnn
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch -n fedmolecule
conda install -c anaconda mpi4py grpcio
conda install scikit-learn numpy h5py setproctitle networkx
pip install -r requirements.txt 
cd FedML; git submodule init; git submodule update; cd ../;
pip install -r FedML/requirements.txt

Data Preparation

Experiments

Centralized Molecule Property Classification experiments

python experiments/centralized/moleculenet/molecule_classification_multilabel.py

Centralized Molecule Property Regression experiments

python experiments/centralized/moleculenet/molecule_regression_multivariate.py

Arguments for Centralized Training

This is a list of arguments used in centralized experiments.

--dataset --> Dataset used for training
--data_dir' --> Data directory
--partition_method -> how to partition the dataset
--sage_hidden_size' -->Size of GraphSAGE hidden layer
--node_embedding_dim --> Dimensionality of the vector space the atoms will be embedded in
--sage_dropout --> Dropout used between GraphSAGE layers
--readout_hidden_dim --> Size of the readout hidden layer
--graph_embedding_dim --> Dimensionality of the vector space the molecule will be embedded in
--client_optimizer -> Optimizer function(Adam or SGD)
--lr --> learning rate (default: 0.0015)
--wd --> Weight decay(default=0.001)
--epochs -->Number of epochs
--frequency_of_the_test --> How frequently to run eval
--device -->gpu device for training

Distributed/Federated Molecule Property Classification experiments

sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider "./../../../data/sider/" 0

##run on background
nohup sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider "./../../../data/sider/" 0 > ./fedavg-graphsage.log 2>&1 &

Distributed/Federated Molecule Property Regression experiments

sh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv "./../../../data/freesolv/" 0

##run on background
nohup sh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv "./../../../data/freesolv/" 0 > ./fedavg-graphsage.log 2>&1 &

Arguments for Distributed/Federated Training

This is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting.

CLIENT_NUM=$1 -> Number of clients in dist/fed setting
WORKER_NUM=$2 -> Number of workers
SERVER_NUM=$3 -> Number of servers
GPU_NUM_PER_SERVER=$4 -> GPU number per server
MODEL=$5 -> Model name
DISTRIBUTION=$6 -> Dataset distribution. homo for IID splitting. hetero for non-IID splitting.
ROUND=$7 -> Number of Distiributed/Federated Learning Rounds
EPOCH=$8 -> Number of epochs to train clients' local models
BATCH_SIZE=$9 -> Batch size 
LR=${10}  -> learning rate
SAGE_DIM=${11} -> Dimenionality of GraphSAGE embedding
NODE_DIM=${12} -> Dimensionality of node embeddings
SAGE_DR=${13} -> Dropout rate applied between GraphSAGE Layers
READ_DIM=${14} -> Dimensioanlity of readout embedding
GRAPH_DIM=${15} -> Dimensionality of graph embedding
DATASET=${16} -> Dataset name (Please check data folder to see all available datasets)
DATA_DIR=${17} -> Dataset directory
CI=${18}

Code Structure of FedGraphNN

FedML: A soft repository link generated using git submodule add https://github.com/FedML-AI/FedML.
data: Provide data downloading scripts and store the downloaded datasets. Note that in FedML/data, there also exists datasets for research, but these datasets are used for evaluating federated optimizers (e.g., FedAvg) and platforms. FedGraphNN supports more advanced datasets and models for federated training of graph neural networks. Currently, we have molecular machine learning datasets.
data_preprocessing: Domain-specific PyTorch Data loaders for centralized and distributed training.
model: GNN models.
trainer: please define your own trainer.py by inheriting the base class in FedML/fedml-core/trainer/fedavg_trainer.py. Some tasks can share the same trainer.
experiments/distributed:

experiments is the entry point for training. It contains experiments in different platforms. We start from distributed.
Every experiment integrates FOUR building blocks FedML (federated optimizers), data_preprocessing, model, trainer.
To develop new experiments, please refer the code at experiments/distributed/text-classification.

experiments/centralized:

please provide centralized training script in this directory.
This is used to get the reference model accuracy for FL.
You may need to accelerate your training through distributed training on multi-GPUs and multi-machines. Please refer the code at experiments/centralized/DDP_demo.

Update FedML Submodule

cd FedML
git checkout master && git pull
cd ..
git add FedML
git commit -m "updating submodule FedML to latest"
git push

Citation

Please cite our FedML paper if it helps your research. You can describe us in your paper like this: "We develop our experiments based on FedML".

@misc{he2021fedgraphnn,
      title={FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks}, 
      author={Chaoyang He and Keshav Balasubramanian and Emir Ceyani and Yu Rong and Peilin Zhao and Junzhou Huang and Murali Annavaram and Salman Avestimehr},
      year={2021},
      eprint={2104.07145},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

Related tags

Overview

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks

Installation

Data Preparation

Experiments

Centralized Molecule Property Classification experiments

Centralized Molecule Property Regression experiments

Arguments for Centralized Training

Distributed/Federated Molecule Property Classification experiments

Distributed/Federated Molecule Property Regression experiments

Arguments for Distributed/Federated Training

Code Structure of FedGraphNN

Update FedML Submodule

Citation

Owner

FedML-AI

A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Docker containers of baseline agents for the Crafter environment

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

A toolkit for making real world machine learning and data analysis applications in C++

Deep Learning (with PyTorch)

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

Generative Adversarial Networks(GANs)

Hierarchical Few-Shot Generative Models

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Official implementation of Meta-StyleSpeech and StyleSpeech

Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

最新版本yolov5+deepsort目标检测和追踪，支持5.0版本可训练自己数据集

A Structured Self-attentive Sentence Embedding

OpenMMLab 3D Human Parametric Model Toolbox and Benchmark

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)