SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

Last update: Dec 24, 2022

Related tags

Overview

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

A novel graph neural network (GNN) based model (termed SlideGraph+) to predict HER2 status directly from whole-slide images of routine Haematoxylin and Eosin (H&E) slides. This pipeline generates node-level and WSI-level predictions by using a graph representation to capture the biological geometric structure of the cellular architecture at the entire WSI level. A pre-processing function is used to do adaptive spatial agglomerative clustering to group spatially neighbouring regions with high degree of feature similarity and construct a WSI-level graph based on clusters.

Data

The repository can be used for constructing WSI-level graphs, training SlideGraph and predicting HER2 status on WSI-level graphs. The training data used in this study was downloaded from TCGA using https://portal.gdc.cancer.gov/projects/TCGA-BRCA.

Workflow of predicting HER2 status from H&E images

GNN network architecture

Environment

Please refer to requirements.txt

Repository Structure

Below are the main executable scripts in the repository:

features_to_graph.py: Construct WSI-level graph

platt.py: Normalise classifier output scores to a probability value

GNN_pr.py: Graph neural network architecture

train.py: Main training and inference script

Training the classification model

Data format

For training, each WSI has to have a WSI-level graph. In order to do that, it is required to generate x,y coordinates, feature vectors for local regions in the WSIs. x,y coordinates can be cental points of patches, centroid of nuclei and so on. Feature varies. It can be nuclear composition features (e.g.,counts of different types of nuclei in the patch), morphological features, receptor expression features, deep features (or neuralfeature embdeddings from a pre-trained neural network) and so on.

Each WSI should be fitted with one npz file which contains three arrays: x_coordinate, y_coordinate and corresponding region-level feature vector. Please refer to feature.npz in the example folder.

Graph construction

After npz files are ready, run features_to_graph.py to group spatially neighbouring regions with high degree of feature similarity and construct a graph based on clusters for each WSI.

Set path to the feature directories (feature_path)
Set path where graphs will be saved (output_path)
Modify hyperparameters, including similarity parameters (lambda_d, lambda_f), hierachical clustering distance threshold (lamda_h) and node connection distance threshold (distance_thres)

Training

After getting graphs of all WSIs,

Set path to the graph directories (bdir) in train.py
Set path to the clinical data (clin_path) in train.py
Modify hyperparameters, including learning_rate, weight_decay in train.py

Train the classification model and do 5-fold stratified cross validation using

python train.py

In each fold, top 10 best models (on validation dataset) and the model from the last epoch are tested on the testing dataset. Averaged classification performance among 5 folds are presented in the end.

Heatmap of node-level prediction scores

Heatmaps of node-level prediction scores and zoomed-in regions which have different levels of HER2 prediction score. Boundary colour of each zoomed-in region represents its contribution to HER2 positivity (prediction score).

License

The source code SlideGraph as hosted on GitHub is released under the GNU General Public License (Version 3).

The full text of the licence is included in LICENSE.md.

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

Related tags

Overview

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

Data

Workflow of predicting HER2 status from H&E images

GNN network architecture

Environment

Repository Structure

Training the classification model

Data format

Graph construction

Training

Heatmap of node-level prediction scores

License

Owner

PFFDTD is an open-source FDTD simulator for 3D room acoustics

FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation

Dynamic wallpaper generator.

DiSECt: Differentiable Simulator for Robotic Cutting

labelpix is a graphical image labeling interface for drawing bounding boxes

Contains source code for the winning solution of the xView3 challenge

Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

HairCLIP: Design Your Hair by Text and Reference Image

Generating Videos with Scene Dynamics

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Pytorch implementation of One-Shot Affordance Detection

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Rule Based Classification Project For Python

FridaHookAppTool - Frida Hook App Tool With Python

Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.

A Temporal Extension Library for PyTorch Geometric

An intelligent, flexible grammar of machine learning.