A real world application of a Recurrent Neural Network on a binary classification of time series data

Overview

What is this

This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data cleanup, model creation, fitting, and testing/reporting and was designed and analysed in less than 24 hours.

Challenge and input

Three input files were provided for this challenge:

  • aigua.csv
  • aire.csv
  • amoni.csv (amoni_pred.csv is the same thing with integers rather than booleans)

The objective is to train a Machine Learning classifier that can predict dangerous drift on amoni.

Analysis procedure

Gretl has benn used to analyze the data.

Ideally, fuzzing techniques would be applied that would remove the input noise on amoni from the correlation with aigua.csv and aire.csv. After many hours of analysis I decided that the input files aire.csv and aigua.csv did not provide enough valuable data.

After much analysis of the amoni.csv file, I identified a technique that was able to remove most of the noise.

The technique has been implemented into the run.py file. This file cleanups up the data on amoni_pred.csv. It groups data by time intervals and gets the mean. It removes values that are too small. It clips the domain of the values. It removes noise by selecting the minimum values in a window slice. And (optionally) it corrects the dangerous drift values.

Generating the model

Once the file amoni_pred_base.csv has been created after cleaning up the input, we can move on to generating the model. Models are created and trained by the pred.py file. This file creates a Neural Network architecture with Recurrent Neural Networks (RNN). To be more precise, this NN has been tested with SimpleRNN and Long Short Term Memory (LSTM) layers. LSTM were chosed because they were seen to converge faster and provide better results and flexibility.

The input has been split on train/test sets. In order to test the network on fully unknown intervals, the test window time is non overlapping with the train window.

In order to allow prediction of a value, a window time slice is fed on to the LSTM layers. This window only includes past values and does not provide a lookahead cheat opportunity. The model is trained with checkpoints tracking testing accuracy. Loss and accuracy graphs are automatically generated for the training and testing sets.

Testing the models

After the models have been generated, the file test.py predicts the drift and dangerous values on the input data, It also provides accuracy metrics and saves the resulting file output.csv. This file can then be analysed with Gretl.

Performance

Our models are capable of achieving:

  • ~ 75% Accuracy on dangerous drifts with minimal time delays
  • ~ 80% Accuracy on drifts with minimal time delays

Moreover, with the set of corrections of the dangerous drift input values explained in previous sections, our model can achieve:

  • ~ 87% Accuracy on dangerous drifts with minimal time delays

Future Work / Improvements

Many improvements are possible on this architecture. First of all, fine tuning of the hyper parameters (clean up data set values, NN depth, type of layers, etc) should all be considered. Furthermore, more data should be collected, because the current data set only provides information for ~ 8 drifts. On top of that, more advanced noise analysis techniques should be applied, like fuzzing, exponential smoothing etc.

Other possible techniques

Yes, Isolation Forests are probably a better idea. But LSTM layers are cool :)

Show me some pictures

In blue, expected dangerous drift predictions. In orange the prediction by the presented model.

Screenshot1

Furthermore, with the patched dangerous drift patch:

Screenshot2

Owner
Josep Maria Salvia Hornos
Studying Business Management & Computer Science :D
Josep Maria Salvia Hornos
Runtime type annotations for the shape, dtype etc. of PyTorch Tensors.

torchtyping Type annotations for a tensor's shape, dtype, names, ... Turn this: def batch_outer_product(x: torch.Tensor, y: torch.Tensor) - torch.Ten

Patrick Kidger 1.2k Jan 03, 2023
This code is a near-infrared spectrum modeling method based on PCA and pls

Nirs-Pls-Corn This code is a near-infrared spectrum modeling method based on PCA and pls 近红外光谱分析技术属于交叉领域,需要化学、计算机科学、生物科学等多领域的合作。为此,在(北邮邮电大学杨辉华老师团队)指导下

Fu Pengyou 6 Dec 17, 2022
Code for "Optimizing risk-based breast cancer screening policies with reinforcement learning"

Tempo: Optimizing risk-based breast cancer screening policies with reinforcement learning Introduction This repository was used to develop Tempo, as d

Adam Yala 12 Oct 11, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022
Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

CLJ 1 Jan 10, 2022
Remote sensing change detection using PaddlePaddle

Change Detection Laboratory Developing and benchmarking deep learning-based remo

Lin Manhui 15 Sep 23, 2022
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Antonio Piazza 39 Dec 27, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022
Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Codebase for "ProtoAttend: Attention-Based Prototypical Learning." Authors: Sercan O. Arik and Tomas Pfister Paper: Sercan O. Arik and Tomas Pfister,

47 2 May 17, 2022
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
SoGCN: Second-Order Graph Convolutional Networks

SoGCN: Second-Order Graph Convolutional Networks This is the authors' implementation of paper "SoGCN: Second-Order Graph Convolutional Networks" in Py

Yuehao 7 Aug 16, 2022
ARAE-Tensorflow for Discrete Sequences (Adversarially Regularized Autoencoder)

ARAE Tensorflow Code Code for the paper Adversarially Regularized Autoencoders for Generating Discrete Structures by Zhao, Kim, Zhang, Rush and LeCun

19 Nov 12, 2021
Unofficial Implementation of Oboe (SIGCOMM'18').

Oboe-Reproduce This is the unofficial implementation of the paper "Oboe: Auto-tuning video ABR algorithms to network conditions, Zahaib Akhtar, Yun Se

Tianchi Huang 13 Nov 04, 2022
The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Swin-Unet The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"(https://arxiv.org/abs/2105.05537). A validatio

869 Jan 07, 2023
CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery

CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery This paper (CoANet) has been published in IEEE TIP 2021. This code i

Jie Mei 53 Dec 03, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 07, 2023
Learning cell communication from spatial graphs of cells

ncem Features Repository for the manuscript Fischer, D. S., Schaar, A. C. and Theis, F. Learning cell communication from spatial graphs of cells. 2021

Theis Lab 77 Dec 30, 2022
The Body Part Regression (BPR) model translates the anatomy in a radiologic volume into a machine-interpretable form.

Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Computing (MIC). Please make sure that your usage of this code is in compl

MIC-DKFZ 40 Dec 18, 2022