Twin-deep neural network for semi-supervised learning of materials properties

Related tags

Deep Learningtsdnn
Overview

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction

Citation:

Semi-supervised teacher-student deep neural network for materials discovery” by Daniel Gleaves, Edirisuriya M. Dilanga Siriwardane,Yong Zhao, and JianjunHu.

Machine learning and evolution laboratory

Department of Computer Science and Engineering

University of South Carolina


This software package implements the Meta Pseudo Labels (MPL) semi-supervised learning method with Crystal Graph Convolutional Neural Networks (CGCNN) with that takes an arbitary crystal structure to predict material synthesizability and whether it has positive or negative formation energy

The package provides two major functions:

  • Train a semi-supervised TSDNN classification model with a customized dataset.
  • Predict material synthesizability and formation energy of new crystals with a pre-trained TSDNN model.

The following paper describes the details of the CGCNN architecture, a graph neural network model for materials property prediction: CGCNN paper

The following paper describes the details of the semi-supervised learning framework that we used in our model: Meta Pseudo Labels

Table of Contents

Prerequisites

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

conda upgrade conda
conda create -n tsdnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge

*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.

This creates a conda environment for running TSDNN. Before using TSDNN, activate the environment by:

conda activate tsdnn

Usage

Define a customized dataset

To input crystal structures to TSDNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target label for each crystal (not needed for predicting, but you need to put some random numbers in data_test.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. data_labeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the known value of the target label.

  2. data_unlabeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed).

  3. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  4. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.

(4.) data_predict: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed). This is the file that will be used if you want to classify materials with predict.py.

The structure of the root_dir should be:

root_dir
├── data_labeled.csv
├── data_unlabeled.csv
├── data_test.csv
├── data_positive.csv (optional- for positive and unlabeled dataset generation)
├── data_unlabeled_full.csv (optional- for positive and unlabeled dataset generation, data_unlabeled.csv will be overwritten)
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There is an example of customized a dataset in: data/example.

Train a TSDNN model

Before training a new TSDNN model, you will need to:

Then, in directory synth-tsdnn, you can train a TSDNN model for your customized dataset by:

python main.py root_dir

If you want to use the PU learning dataset generation, you can train a model using the --uds flag with the number of PU iterations to perform.

python main.py --uds 5 root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/example has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/example

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/example

After training, you will get 5 files in synth-tsdnn directory.

  • checkpoints/teacher_best.pth.tar: stores the TSDNN teacher model with the best validation accuracy.
  • checkpoints/student_best.pth.tar" stores the TSDNN student model with the best validation accuracy.
  • checkpoints/t_checkpoint.pth.tar: stores the TSDNN teacher model at the last epoch.
  • checkpoints/s_checkpoint.pth.tar: stores the TSDNN student model at the last epoch.
  • results/validation/test_results.csv: stores the ID and predicted value for each crystal in training set.

Predict material properties with a pre-trained TSDNN model

Before predicting the material properties, you will need to:

  • Define a customized dataset at root_dir for all the crystal structures that you want to predict.
  • Obtain a pre-trained TSDNN model (example found in checkpoints/pre-trained/pre-train.pth.tar).

Then, in directory synth-tsdnn, you can predict the properties of the crystals in root_dir:

python predict.py checkpoints/pre-trained/pre-trained.pth.tar data/root_dir

After predicting, you will get one file in synth-tsdnn directory:

  • predictions.csv: stores the ID and predicted value for each crystal in test set.

Data

To reproduce our paper, you can download the corresponding datasets following the instruction. Each dataset discussed can be found in data/datasets/

Authors

This software was primarily written by Daniel Gleaves who was advised by Prof. Jianjun Hu. This software builds upon work by Tian Xie, Hieu Pham, and Jungdae Kim.

Acknowledgements

Research reported in this work was supported in part by NSF under grants 1940099 and 1905775. The views, perspective,and content do not necessarily represent the official views of NSF. This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program.

License

TSDNN is released under the MIT License.

Owner
MLEG
MLEG
Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation The code repository for "Audio-Visual Generalized Few-Shot Learning with

Kaiaicy 3 Jun 27, 2022
Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

Colin Graber 48 Dec 23, 2022
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

259 Dec 26, 2022
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022
sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

Jonathan Shobrook 305 Dec 21, 2022
NEG loss implemented in pytorch

Pytorch Negative Sampling Loss Negative Sampling Loss implemented in PyTorch. Usage neg_loss = NEG_loss(num_classes, embedding_size) optimizer =

Daniil Gavrilov 123 Sep 13, 2022
A simple API wrapper for Discord interactions.

Your ultimate Discord interactions library for discord.py. About | Installation | Examples | Discord | PyPI About What is discord-py-interactions? dis

james 641 Jan 03, 2023
Using Tensorflow Object Detection API to detect Waymo open dataset

Waymo-2D-Object-Detection Using Tensorflow Object Detection API to detect Waymo open dataset Result CenterNet Training Loss SSD ResNet Training Loss C

76 Dec 12, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Tracy (Shengmin) Tao 1 Apr 12, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022
Machine Learning University: Accelerated Computer Vision Class

Machine Learning University: Accelerated Computer Vision Class This repository contains slides, notebooks, and datasets for the Machine Learning Unive

AWS Samples 1.3k Dec 28, 2022
Duke Machine Learning Winter School: Computer Vision 2022

mlwscv2002 Welcome to the Duke Machine Learning Winter School: Computer Vision 2022! The MLWS-CV includes 3 hands-on training sessions on implementing

Duke + Data Science (+DS) 9 May 25, 2022
Density-aware Single Image De-raining using a Multi-stream Dense Network (CVPR 2018)

DID-MDN Density-aware Single Image De-raining using a Multi-stream Dense Network He Zhang, Vishal M. Patel [Paper Link] (CVPR'18) We present a novel d

He Zhang 224 Dec 12, 2022
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Deep Hedging Demo Pricing Derivatives using Machine Learning 1) Jupyter version: Run ./colab/deep_hedging_colab.ipynb on Colab. 2) Gui version: Run py

Yu Man Tam 102 Jan 06, 2023
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Dec 29, 2022