Greedy Gaussian Segmentation

Related tags

Deep LearningGGS
Overview

GGS

Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please see our paper at http://stanford.edu/~boyd/papers/ggs.html.


The GGS Solver takes an n-by-T data matrix and breaks the T timestamps on an n-dimensional vector into segments over which the data is well explained as independent samples from a multivariate Gaussian distribution. It does so by formulating a covariance-regularized maximum likelihood problem and solving it using a greedy heuristic, with full details described in the paper.

Download & Setup

  1. Download the source code in the terminal by running:
git clone [email protected]:davidhallac/GGS.git
  1. Confirm that the code was properly downloaded by running:
cd GGS
python helloworld.py
  1. To write your own Python function that uses ggs, simply make sure that ggs.py is in the same directory as your new file, and then add the following code to the beginning of your script:
from ggs import *

Supported Functions

The GGS package has three main functions:

bps, objectives = GGS(data, Kmax, lamb)

Finds K breakpoints in the data for a given regularization parameter lambda

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the number of breakpoints to find

lamb - regularization parameter for the regularized covariance

Returns

bps - List of lists, where element i of the larger list is the set of breakpoints found at K = i in the GGS algorithm

objectives - List of the objective values at each intermediate step (for K = 0 to Kmax)


meancovs = GGSMeanCov(data, breakpoints, lamb)

Finds the means and regularized covariances of each segment, given a set of breakpoints.

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

breakpoints - a list of breakpoint locations

lamb - regularization parameter for the regularized covariance

Returns

meancovs - a list of (mean, covariance) tuples for each segment in the data


cvResults = GGSCrossVal(data, Kmax=25, lambList = [0.1, 1, 10])

Runs 10-fold cross validation, and returns the train and test set likelihood for every (K, lambda) pair up to Kmax

Inputs

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the maximum number of breakpoints to run GGS on

lambList - a list of regularization parameters to test

Returns

cvResults - list of (lamb, ([TrainLL],[TestLL])) tuples for each regularization parameter in lambList. Here, TrainLL and TestLL are the average per-sample log-likelihood across the 10 folds of cross-validation for all K's from 0 to Kmax


Additional optional parameters (for all three functions above):

features = [] - select a certain subset of columns in the data to operate on

verbose = False - Print intermediate steps when running the algorithm

Example Usage

Running financeExample.py will yield the following plot, showing the objective (Equation 4 in the paper) vs. the number of breakpoints:

Objective vs. # of breakpoints

Once we have solved for the locations of the breakpoints, we can use the FindMeanCovs() function to find the means and covariances of each segment. In the example in helloworld.py, plotting the means, variances, and covariances of the three signals yields:

Means and covariances over time

To run cross-validation, which can be useful in determining optimal values of K and lambda, we can use the following code to load the data, run the cross-validation, and then plot the test and train likelihood:

from ggs import *
import numpy as np
import matplotlib.pyplot as plt

filename = "Returns.txt"
data = np.genfromtxt(filename,delimiter=' ')
feats = [0,3,7]

#Run cross-validaton up to Kmax = 30, at lambda = 1e-4
maxBreaks = 30
lls = GGSCrossVal(data, Kmax=maxBreaks, lambList = [1e-4], features = feats, verbose = False)

trainLikelihood = lls[0][1][0]
testLikelihood = lls[0][1][1]
plt.plot(range(maxBreaks+1), testLikelihood)
plt.plot(range(maxBreaks+1), trainLikelihood)
plt.legend(['Test LL','Train LL'], loc='best')
plt.show()

The resulting plot looks like:

Test and train likelihood

References

Greedy Gaussian Segmentation of Time Series Data -- D. Hallac, P. Nystrup, and S. Boyd

Authors

David Hallac, Peter Nystrup, and Stephen Boyd.

Owner
Stanford University Convex Optimization Group
Stanford University Convex Optimization Group
Joint Gaussian Graphical Model Estimation: A Survey

Joint Gaussian Graphical Model Estimation: A Survey Test Models Fused graphical lasso [1] Group graphical lasso [1] Graphical lasso [1] Doubly joint s

Koyejo Lab 1 Aug 10, 2022
ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees This repository is the official implementation of the empirica

Kuan-Lin (Jason) Chen 2 Oct 02, 2022
Interpretation of T cell states using reference single-cell atlases

Interpretation of T cell states using reference single-cell atlases ProjecTILs is a computational method to project scRNA-seq data into reference sing

Cancer Systems Immunology Lab 139 Jan 03, 2023
The official code of "SCROLLS: Standardized CompaRison Over Long Language Sequences".

SCROLLS This repository contains the official code of the paper: "SCROLLS: Standardized CompaRison Over Long Language Sequences". Links Official Websi

TAU NLP Group 39 Dec 23, 2022
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

HyFactor Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce a novel open-source archit

Laboratoire-de-Chemoinformatique 11 Oct 10, 2022
This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Subreddit Analysis This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by

Georgios Chochlakis 1 Dec 17, 2021
NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch, train it, collect statistics, and share it among the members of the community. It is not just a visual

HTM Community 93 Sep 30, 2022
The second project in Python course on FCC

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Denise T 1 Dec 13, 2021
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Sundararaman 76 Dec 06, 2022
Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On [Project website] [Dataset] [Video] Abstract We propose a new g

71 Dec 24, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
Discover hidden deepweb pages

DeepWeb Scapper Att: Demo version An simple script to scrappe deepweb to find pages. Will return if any of those exists and will save on a file. You s

Héber Júlio 77 Oct 02, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022
Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

KnowPrompt Code and datasets for our paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction" Requireme

ZJUNLP 137 Dec 31, 2022
automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

Automatic-precautionary-guard automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..) what is this

badra 0 Jan 06, 2022
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

NICE-GAN-pytorch - Official PyTorch implementation of NICE-GAN: Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen 208 Nov 25, 2022