Anomaly Detection with R

Overview

AnomalyDetection R package

Build Status Pending Pull-Requests Github Issues

AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences.

How the package works

The underlying algorithm – referred to as Seasonal Hybrid ESD (S-H-ESD) builds upon the Generalized ESD test for detecting anomalies. Note that S-H-ESD can be used to detect both global as well as local anomalies. This is achieved by employing time series decomposition and using robust statistical metrics, viz., median together with ESD. In addition, for long time series (say, 6 months of minutely data), the algorithm employs piecewise approximation - this is rooted to the fact that trend extraction in the presence of anomalies in non-trivial - for anomaly detection.

Besides time series, the package can also be used to detect anomalies in a vector of numerical values. We have found this very useful as many times the corresponding timestamps are not available. The package provides rich visualization support. The user can specify the direction of anomalies, the window of interest (such as last day, last hour), enable/disable piecewise approximation; additionally, the x- and y-axis are annotated in a way to assist visual data analysis.

How to get started

Install the R package using the following commands on the R console:

install.packages("devtools")
devtools::install_github("twitter/AnomalyDetection")
library(AnomalyDetection)

The function AnomalyDetectionTs is called to detect one or more statistically significant anomalies in the input time series. The documentation of the function AnomalyDetectionTs, which can be seen by using the following command, details the input arguments and the output of the function AnomalyDetectionTs.

help(AnomalyDetectionTs)

The function AnomalyDetectionVec is called to detect one or more statistically significant anomalies in a vector of observations. The documentation of the function AnomalyDetectionVec, which can be seen by using the following command, details the input arguments and the output of the function AnomalyDetectionVec.

help(AnomalyDetectionVec)

A simple example

To get started, the user is recommended to use the example dataset which comes with the packages. Execute the following commands:

data(raw_data)
res = AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', plot=TRUE)
res$plot

Fig 1

From the plot, we observe that the input time series experiences both positive and negative anomalies. Furthermore, many of the anomalies in the time series are local anomalies within the bounds of the time series’ seasonality (hence, cannot be detected using the traditional approaches). The anomalies detected using the proposed technique are annotated on the plot. In case the timestamps for the plot above were not available, anomaly detection could then carried out using the AnomalyDetectionVec function; specifically, one can use the following command:

AnomalyDetectionVec(raw_data[,2], max_anoms=0.02, period=1440, direction='both', only_last=FALSE, plot=TRUE)

Often, anomaly detection is carried out on a periodic basis. For instance, at times, one may be interested in determining whether there was any anomaly yesterday. To this end, we support a flag only_last whereby one can subset the anomalies that occurred during the last day or last hour. Execute the following command:

res = AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', only_last=”day”, plot=TRUE)
res$plot

Fig 2

From the plot, we observe that only the anomalies that occurred during the last day have been annotated. Further, the prior six days are included to expose the seasonal nature of the time series but are put in the background as the window of prime interest is the last day.

Anomaly detection for long duration time series can be carried out by setting the longterm argument to T.

Copyright and License

Copyright 2015 Twitter, Inc and other contributors

Licensed under the GPLv3

You might also like...
A Python Library for Graph Outlier Detection (Anomaly Detection)
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

Anomaly Detection and Correlation library

luminol Overview Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detecti

Find big moving stocks before they move using machine learning and anomaly detection
Find big moving stocks before they move using machine learning and anomaly detection

Surpriver - Find High Moving Stocks before they Move Find high moving stocks before they move using anomaly detection and machine learning. Surpriver

A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Paper list of log-based anomaly detection

Paper list of log-based anomaly detection

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

Demo project for real time anomaly detection using kafka and python
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Unofficial implementation of PatchCore anomaly detection
Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

Anomaly detection on SQL data warehouses and databases
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks This is a Python3 / Pytorch implementation of TadGAN paper. The associated

Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.
Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Industrial KNN-based Anomaly Detection ⭐ Now has streamlit support! ⭐ Run $ streamlit run streamlit_app.py This repo aims to reproduce the results of

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

A PyTorch implementation of
A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

ANEMONE A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21 Dependencies python==3.6.1 dgl==

Comments
  • Anomaly Detection from Data vs Image

    Anomaly Detection from Data vs Image

    I was assigned with project to do anomaly detection on for all our company KPIs. I googled and found AnomalyDetection by Twitter. There was an idea from my colleague to do the anomaly detection on the graph images (comparing with previous week images to identify anomaly points) instead of using time-series raw data.

    I am not familiar with the Anomaly Detection, anyone here experienced and able to advice which one is better (Anomaly Detection from data or image) in term of accuracy, storage and processing time.

    opened by hscj87 0
  • ad_ts does not work with data.table

    ad_ts does not work with data.table

    I'm using a data set with different time series, I'm store it as data.table So in every iteration I filter by some condition:

    DT[var1 == x, c("date", "var2")]

    Error in rbindlist(l, use.names, fill, idcol) : Class attribute on column 1 of item 2 does not match with column 1 of item 1.

    This happen because date column is store as numeric(0), ie:

    all_anoms <- data.frame(timestamp = numeric(0), count = numeric(0)) meanwhile column date is required to be POSIXct/POSIXlt

    opened by fedemolina 0
  • Cannot remove prior installation of package ‘Rcpp’?

    Cannot remove prior installation of package ‘Rcpp’?

    Error: Failed to install 'AnomalyDetection' from GitHub: (converted from warning) cannot remove prior installation of package ‘Rcpp’

    Which version of R is supported?

    opened by esride-jts 1
  • Definition of period in AnomalyDetectionVec !!!

    Definition of period in AnomalyDetectionVec !!!

    The date of the data I have is the monthly data from January 2010, February 2010 to December 2019. I want to use AnomalyDetectionVec to find anomaly for the data. I am wondering should I set period = 12 or else??? Can someone explain more in detail on how the period perimeter work in AnomalyDetectionVec.

    opened by dbsxo2995 2
Releases(v1.0.0)
  • v1.0.0(Jan 6, 2015)

    Today, we’re announcing AnomalyDetection, our open-source R package that automatically detects anomalies like these in big data in a practical and robust way.

    https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series

    Source code(tar.gz)
    Source code(zip)
Owner
Twitter
Twitter 💙 #opensource
Twitter
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
This tool parses log data and allows to define analysis pipelines for anomaly detection.

logdata-anomaly-miner This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis wit

AECID 32 Nov 27, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Intake is a lightweight package for finding, investigating, loading and disseminating data.

Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps

Intake 851 Jan 01, 2023
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

Sentinel Hub 18 Dec 23, 2022
Python Project on Pro Data Analysis Track

Udacity-BikeShare-Project: Python Project on Pro Data Analysis Track Basic Data Exploration with pandas on Bikeshare Data Basic Udacity project using

Belal Mohammed 0 Nov 10, 2021
Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

John Koumentis 2 Nov 18, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

Ryan McGeehan 3 Nov 04, 2022
Tokyo 2020 Paralympics, Analytics

Tokyo 2020 Paralympics, Analytics Thanks for checking out my app! It was built entirely using matplotlib and Tokyo 2020 Paralympics data. This applica

Petro Ivaniuk 1 Nov 18, 2021
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 05, 2022
sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

Saiem Gilani 37 Dec 27, 2022