A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Overview

PyPI version Build Status Downloads Downloads/Week License

matrixprofile-ts

matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keogh and Mueen research groups at UC-Riverside and the University of New Mexico. Current implementations include MASS, STMP, STAMP, STAMPI, STOMP, SCRIMP++, and FLUSS.

Read the Target blog post here.

Further academic description can be found here.

The PyPi page for matrixprofile-ts is here

Contents

Installation

Major releases of matrixprofile-ts are available on the Python Package Index:

pip install matrixprofile-ts

Details about each release can be found here.

Quick start

>>> from matrixprofile import *
>>> import numpy as np
>>> a = np.array([0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0])
>>> matrixProfile.stomp(a,4)
(array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([4., 5., 6., 7., 0., 1., 2., 3., 0.]))

Note that SCRIMP++ is highly recommended for calculating the Matrix Profile due to its speed and anytime ability.

Examples

Jupyter notebooks containing various examples of how to use matrixprofile-ts can be found under docs/examples.

As a basic introduction, we can take a synthetic signal and use STOMP to calculate the corresponding Matrix Profile (this is the same synthetic signal as in the Golang Matrix Profile library). Code for this example can be found here

datamp

There are several items of note:

  • The Matrix Profile value jumps at each phase change. High Matrix Profile values are associated with "discords": time series behavior that hasn't been observed before.

  • Repeated patterns in the data (or "motifs") lead to low Matrix Profile values.

We can introduce an anomaly to the end of the time series and use STAMPI to detect it

datampanom

The Matrix Profile has spiked in value, highlighting the (potential) presence of a new behavior. Note that Matrix Profile anomaly detection capabilities will depend on the nature of the data, as well as the selected subquery length parameter. Like all good algorithms, it's important to try out different parameter values.

Algorithm Comparison

This section shows the matrix profile algorithms and the time it takes to compute them. It also discusses use cases on when to use one versus another. The timing comparison is based on the synthetic sample data set to show run time speed.

For a more comprehensive runtime comparison, please review the notebook docs/examples/Algorithm Comparison.ipynb.

All time comparisons were ran on a 4 core 2.8 ghz processor with 16 GB of memory. The operating system used was Ubuntu 18.04LTS 64 bit.

Algorithm Time to Complete Description
STAMP 310 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) STAMP is an anytime algorithm that lets you sample the data set to get an approximate solution. Our implementation provides you with the option to specify the sampling size in percent format.
STOMP 79.8 ms ± 473 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) STOMP computes an exact solution in a very efficient manner. When you have a historic time series that you would like to examine, STOMP is typically the quickest at giving an exact solution.
SCRIMP++ 59 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) SCRIMP++ merges the concepts of STAMP and STOMP together to provide an anytime algorithm that enables "interactive analysis speed". Essentially, it provides an exact or approximate solution in a very timely manner. Our implementation allows you to specify the max number of seconds you are willing to wait for a solution to obtain an approximate solution. If you are wanting the exact solution, it is able to provide that as well. The original authors of this algorithm suggest that SCRIMP++ can be used in all use cases.

Matrix Profile in Other Languages

Contact

Citations

  1. Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016

  2. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Berisk and Eamonn Keogh (2016). EEE ICDM 2016

  3. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. Hoang Anh Dau and Eamonn Keogh. KDD'17, Halifax, Canada.

  4. Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar and Eamonn Keogh, ICDM 2018.

  5. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar, Liudmila Ulanova, and Eamonn Keogh. ICDM 2017.

Owner
Target
Target's official GitHub organization
Target
Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

Packt 124 Dec 28, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
Production Grade Machine Learning Service

This project is made to help you scale from a basic Machine Learning project for research purposes to a production grade Machine Learning web service

Abdullah Zaiter 10 Apr 04, 2022
Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

sklearn-compatible Random Bits Forest Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a b

Tamas Madl 8 Jul 24, 2021
A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art

Louis-François Bouchard 3.3k Jan 04, 2023
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
使用数学和计算机知识投机倒把

偷鸡不成项目集锦 坦率地讲,涉及金融市场的好策略如果公开,必然导致使用的人多,最后策略变差。所以这个仓库只收集我目前失败了的案例。 加密货币组合套利 中国体育彩票预测 我赚不上钱的项目,也许可以帮助更有能力的人去赚钱。

Roy 28 Dec 29, 2022
Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python Overview Bank Jago has attracted investors' attention since the end

Najibulloh Asror 3 Feb 10, 2022
scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.

Sklearn-genetic-opt scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms. This is meant to be an alternativ

Rodrigo Arenas 180 Dec 20, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 09, 2023
Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

DataCanvas 216 Dec 23, 2022
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

IDLab Services 90 Oct 28, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 04, 2023
Uber Open Source 1.6k Dec 31, 2022
A Python implementation of FastDTW

fastdtw Python implementation of FastDTW [1], which is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal align

tanitter 651 Jan 04, 2023
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

Garvit Verma 1 Jul 09, 2022
Winning solution for the Galaxy Challenge on Kaggle

Winning solution for the Galaxy Challenge on Kaggle

Sander Dieleman 483 Jan 02, 2023
PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

Square 265 Dec 27, 2022
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022