A real world application of a Recurrent Neural Network on a binary classification of time series data

Overview

What is this

This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data cleanup, model creation, fitting, and testing/reporting and was designed and analysed in less than 24 hours.

Challenge and input

Three input files were provided for this challenge:

  • aigua.csv
  • aire.csv
  • amoni.csv (amoni_pred.csv is the same thing with integers rather than booleans)

The objective is to train a Machine Learning classifier that can predict dangerous drift on amoni.

Analysis procedure

Gretl has benn used to analyze the data.

Ideally, fuzzing techniques would be applied that would remove the input noise on amoni from the correlation with aigua.csv and aire.csv. After many hours of analysis I decided that the input files aire.csv and aigua.csv did not provide enough valuable data.

After much analysis of the amoni.csv file, I identified a technique that was able to remove most of the noise.

The technique has been implemented into the run.py file. This file cleanups up the data on amoni_pred.csv. It groups data by time intervals and gets the mean. It removes values that are too small. It clips the domain of the values. It removes noise by selecting the minimum values in a window slice. And (optionally) it corrects the dangerous drift values.

Generating the model

Once the file amoni_pred_base.csv has been created after cleaning up the input, we can move on to generating the model. Models are created and trained by the pred.py file. This file creates a Neural Network architecture with Recurrent Neural Networks (RNN). To be more precise, this NN has been tested with SimpleRNN and Long Short Term Memory (LSTM) layers. LSTM were chosed because they were seen to converge faster and provide better results and flexibility.

The input has been split on train/test sets. In order to test the network on fully unknown intervals, the test window time is non overlapping with the train window.

In order to allow prediction of a value, a window time slice is fed on to the LSTM layers. This window only includes past values and does not provide a lookahead cheat opportunity. The model is trained with checkpoints tracking testing accuracy. Loss and accuracy graphs are automatically generated for the training and testing sets.

Testing the models

After the models have been generated, the file test.py predicts the drift and dangerous values on the input data, It also provides accuracy metrics and saves the resulting file output.csv. This file can then be analysed with Gretl.

Performance

Our models are capable of achieving:

  • ~ 75% Accuracy on dangerous drifts with minimal time delays
  • ~ 80% Accuracy on drifts with minimal time delays

Moreover, with the set of corrections of the dangerous drift input values explained in previous sections, our model can achieve:

  • ~ 87% Accuracy on dangerous drifts with minimal time delays

Future Work / Improvements

Many improvements are possible on this architecture. First of all, fine tuning of the hyper parameters (clean up data set values, NN depth, type of layers, etc) should all be considered. Furthermore, more data should be collected, because the current data set only provides information for ~ 8 drifts. On top of that, more advanced noise analysis techniques should be applied, like fuzzing, exponential smoothing etc.

Other possible techniques

Yes, Isolation Forests are probably a better idea. But LSTM layers are cool :)

Show me some pictures

In blue, expected dangerous drift predictions. In orange the prediction by the presented model.

Screenshot1

Furthermore, with the patched dangerous drift patch:

Screenshot2

Owner
Josep Maria Salvia Hornos
Studying Business Management & Computer Science :D
Josep Maria Salvia Hornos
Official implementation of ETH-XGaze dataset baseline

ETH-XGaze baseline Official implementation of ETH-XGaze dataset baseline. ETH-XGaze dataset ETH-XGaze dataset is a gaze estimation dataset consisting

Xucong Zhang 134 Jan 03, 2023
CLIP + VQGAN / PixelDraw

clipit Yet Another VQGAN-CLIP Codebase This started as a fork of @nerdyrodent's VQGAN-CLIP code which was based on the notebooks of @RiversWithWings a

dribnet 276 Dec 12, 2022
Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection Introduction This repository includes codes and models of "Effect of De

Amir Abbasi 5 Sep 05, 2022
[CVPR2021] De-rendering the World's Revolutionary Artefacts

De-rendering the World's Revolutionary Artefacts Project Page | Video | Paper In CVPR 2021 Shangzhe Wu1,4, Ameesh Makadia4, Jiajun Wu2, Noah Snavely4,

49 Nov 06, 2022
SciFive: a text-text transformer model for biomedical literature

SciFive SciFive provided a Text-Text framework for biomedical language and natural language in NLP. Under the T5's framework and desrbibed in the pape

Long Phan 54 Dec 24, 2022
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien

33 Dec 29, 2022
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
Prototype python implementation of the ome-ngff table spec

Prototype python implementation of the ome-ngff table spec

Kevin Yamauchi 8 Nov 20, 2022
Provide partial dates and retain the date precision through processing

Prefix date parser This is a helper class to parse dates with varied degrees of precision. For example, a data source might state a date as 2001, 2001

Friedrich Lindenberg 13 Dec 14, 2022
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 410 Jan 03, 2023
Robotics environments

Robotics environments Details and documentation on these robotics environments are available in OpenAI's blog post and the accompanying technical repo

Farama Foundation 121 Dec 28, 2022
2 Jul 19, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

Rishik Mourya 48 Dec 20, 2022
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Adversrial Machine Learning Benchmarks This code belongs to the papers: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness? Det

Adversarial Machine Learning 9 Nov 27, 2022
Face and Body Tracking for VRM 3D models on the web.

Kalidoface 3D - Face and Full-Body tracking for Vtubing on the web! A sequal to Kalidoface which supports Live2D avatars, Kalidoface 3D is a web app t

Rich 257 Jan 02, 2023
Generate Contextual Directory Wordlist For Target Org

PathPermutor Generate Contextual Directory Wordlist For Target Org This script generates contextual wordlist for any target org based on the set of UR

8 Jun 23, 2021
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces Official code release for NGLOD. For technical details, please refer t

659 Dec 27, 2022