Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

Overview

GENDIS Build Status PyPI version Read The Docs Downloads

GENetic DIscovery of Shapelets

In the time series classification domain, shapelets are small subseries that are discriminative for a certain class. It has been shown that by projecting the original dataset to a distance space, where each axis corresponds to the distance to a certain shapelet, classifiers are able to achieve state-of-the-art results on a plethora of datasets.

This repository contains an implementation of GENDIS, an algorithm that searches for a set of shapelets in a genetic fashion. The algorithm is insensitive to its parameters (such as population size, crossover and mutation probability, ...) and can quickly extract a small set of shapelets that is able to achieve predictive performances similar (or better) to that of other shapelet techniques.

Installation

We currently support Python 3.5 & Python 3.6. For installation, there are two alternatives:

  1. Clone the repository https://github.com/IBCNServices/GENDIS.git and run (python3 -m) pip -r install requirements.txt
  2. GENDIS is hosted on PyPi. You can just run (python3 -m) pip install gendis to add gendis to your dist-packages (you can use it from everywhere).

Make sure NumPy and Cython is already installed (pip install numpy and pip install Cython), since that is required for the setup script.

Tutorial & Example

1. Loading & preprocessing the datasets

In a first step, we need to construct at least a matrix with timeseries (X_train) and a vector with labels (y_train). Additionally, test data can be loaded as well in order to evaluate the pipeline in the end.

import pandas as pd
# Read in the datafiles
train_df = pd.read_csv(<DATA_FILE>)
test_df = pd.read_csv(<DATA_FILE>)
# Split into feature matrices and label vectors
X_train = train_df.drop('target', axis=1)
y_train = train_df['target']
X_test = test_df.drop('target', axis=1)
y_test = test_df['target']

2. Creating a GeneticExtractor object

Construct the object. For a list of all possible parameters, and a description, please refer to the documentation in the code

from gendis.genetic import GeneticExtractor
genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=True, 
                                     mutation_prob=0.3, crossover_prob=0.3, 
                                     wait=10, max_len=len(X_train) // 2)

3. Fit the GeneticExtractor and construct distance matrix

shapelets = genetic_extractor.fit(X_train, y_train)
distances_train = genetic_extractor.transform(X_train)
distances_test = genetic_extractor.transform(X_test)

4. Fit ML classifier on constructed distance matrix

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
lr = LogisticRegression()
lr.fit(distances_train, y_train)

print('Accuracy = {}'.format(accuracy_score(y_test, lr.predict(distances_test))))

Example notebook

A simple example is provided in this notebook

Data

All datasets in this repository are downloaded from timeseriesclassification. Please refer to them appropriately when using any dataset.

Paper experiments

In order to reproduce the results from the corresponding paper, please check out this directory.

Tests

We provide a few doctests and unit tests. To run the doctests: python3 -m doctest -v <FILE>, where <FILE> is the Python file you want to run the doctests from. To run unit tests: nose2 -v

Contributing, Citing and Contact

If you have any questions, are experiencing bugs in the GENDIS implementation, or would like to contribute, please feel free to create an issue/pull request in this repository or take contact with me at gilles(dot)vandewiele(at)ugent(dot)be

If you use GENDIS in your work, please use the following citation:

@article{vandewiele2021gendis,
  title={GENDIS: Genetic Discovery of Shapelets},
  author={Vandewiele, Gilles and Ongenae, Femke and Turck, Filip De},
  journal={Sensors},
  volume={21},
  number={4},
  pages={1059},
  year={2021},
  publisher={Multidisciplinary Digital Publishing Institute}
}
Owner
IDLab Services
Internet and Data Lab research group from Ghent University
IDLab Services
Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.** S

Sebastian Raschka 4k Dec 30, 2022
The Emergence of Individuality

The Emergence of Individuality

16 Jul 20, 2022
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Ayşe Nur Türkaslan 3 Mar 06, 2022
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022
MLR - Machine Learning Research

Machine Learning Research 1. Project Topic 1.1. Exsiting research Benmark: https://paperswithcode.com/sota ACL anthology for NLP papers: http://www.ac

Charles 69 Oct 20, 2022
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

PKU GeekGame 14 Dec 15, 2022
vortex particles for simulating smoke in 2d

vortex-particles-method-2d vortex particles for simulating smoke in 2d -vortexparticles_s

12 Aug 23, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 04, 2023
This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

Developer Junaid 3 Aug 04, 2022
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 05, 2022
Dive into Machine Learning

Dive into Machine Learning Hi there! You might find this guide helpful if: You know Python or you're learning it 🐍 You're new to Machine Learning You

Michael Floering 11.1k Jan 03, 2023
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Real-time water systems lab 416 Jan 06, 2023
Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

Data Revenue 590 Dec 28, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks l

Ibotta 84 Dec 29, 2022
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022