ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

Overview

ForecastGA

A Python tool to forecast GA data using several popular time series models.

Open In Colab

Logo for ForecastGA

About

Welcome to ForecastGA

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

  • The models are made more intuitive to upgrade and add by having the tool logic separate from the model training and prediction.
  • When calling am.forecast_insample(), any kwargs included (e.g. learning_rate) are passed to the train method of the model.
  • Google Analytics profiles are specified by simply passing the URL (e.g. https://analytics.google.com/analytics/web/?authuser=2#/report-home/aXXXXXwXXXXXpXXXXXX).
  • You can provide a data dict with GA config options or a Pandas Series as the input data.
  • Multiple log levels.
  • Auto GPU detection (via Torch).
  • List all available models, with descriptions, by calling forecastga.print_model_info().
  • Google API info can be passed in the data dict or uploaded as a JSON file named identity.json.
  • Created a companion Google Colab notebook to easily run on GPU.
  • A handy plot function for Colab, forecastga.plot_colab(forecast_in, title="Insample Forecast", dark_mode=True) that formats nicely and also handles Dark Mode!

Models Available

  • ARIMA : Automated ARIMA Modelling
  • Prophet : Modeling Multiple Seasonality With Linear or Non-linear Growth
  • ProphetBC : Prophet Model with Box-Cox transform of the data
  • HWAAS : Exponential Smoothing With Additive Trend and Additive Seasonality
  • HWAMS : Exponential Smoothing with Additive Trend and Multiplicative Seasonality
  • NBEATS : Neural basis expansion analysis (now fixed at 20 Epochs)
  • Gluonts : RNN-based Model (now fixed at 20 Epochs)
  • TATS : Seasonal and Trend no Box Cox
  • TBAT : Trend and Box Cox
  • TBATS1 : Trend, Seasonal (one), and Box Cox
  • TBATP1 : TBATS1 but Seasonal Inference is Hardcoded by Periodicity
  • TBATS2 : TBATS1 With Two Seasonal Periods

How To Use

Find Model Info:

forecastga.print_model_info()

Initialize Model:

Google Analytics:
data = { 'client_id': '',
         'client_secret': '',
         'identity': '',
         'ga_start_date': '2018-01-01',
         'ga_end_date': '2019-12-31',
         'ga_metric': 'sessions',
         'ga_segment': 'organic traffic',
         'ga_url': 'https://analytics.google.com/analytics/web/?authuser=2#/report-home/aXXXXXwXXXXXpXXXXXX',
         'omit_values_over': 2000000
        }

model_list = ["TATS", "TBATS1", "TBATP1", "TBATS2", "ARIMA"]
am = forecastga.AutomatedModel(data , model_list=model_list, forecast_len=30 )
Pandas DataFrame:
# CSV with columns: Date and Sessions
df = pd.read_csv('ga_sessions.csv')
df.Date = pd.to_datetime(df.Date)
df = df.set_index("Date")
data = df.Sessions

model_list = ["TATS", "TBATS1", "TBATP1", "TBATS2", "ARIMA"]
am = forecastga.AutomatedModel(data , model_list=model_list, forecast_len=30 )

Forecast Insample:

forecast_in, performance = am.forecast_insample()

Forecast Outsample:

forecast_out = am.forecast_outsample()

Ensemble Performance:

all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out)

Pretty Plot in Google Colab

forecastga.plot_colab(forecast_in, title="Insample Forecast", dark_mode=True)

Installation

Windows users may need to manually install the two items below via conda :

  1. conda install pystan
  2. conda install pytorch -c pytorch
  3. !pip install --upgrade git+https://github.com/jroakes/ForecastGA.git

otherwise, pip install --upgrade forecastga

This repo support GPU training. Below are a few libraries that may have to be manually installed to support.

pip install --upgrade mxnet-cu101
pip install --upgrade torch 1.7.0+cu101

Acknowledgements

  1. Majority of forecasting code taken from https://github.com/firmai/atspy and refactored heavily.
  2. Google Analytics based off of: https://github.com/debrouwere/google-analytics
  3. Thanks to richardfergie for the addition of the Prophet Box-Cox model to control negative predictions.

Contribute

The goal of this repo is to grow the list of available models to test. If you would like to contribute one please read on. Feel free to have fun naming your models.

  1. Fork the repo.
  2. In the /src/forecastga/models folder there is a model called template.py. You can use this as a template for creating your new model. All available variables are there. Forecastga ensures each model has the right data and calls only the train and forecast methods for each model. Feel free to add additional methods that your model requires.
  3. Edit the /src/forecastga/models/__init__.py file to add your model's information. Follow the format of the other entries. Forecastga relies on loc to find the model and class to find the class to use.
  4. Edit requirments.txt with any additional libraries needed to run your model. Keep in mind that this repo should support GPU training if available and some libraries have separate GPU-enabled versions.
  5. Issue a pull request.

If you enjoyed this tool consider buying me some beer at: Paypalme

Owner
JR Oakes
Hacker, SEO, NC State fan, co-organizer of Raleigh and RTP Meetups, as well as @sengineland author. Tweets are my own.
JR Oakes
Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required)

Binomial Option Pricing Calculator Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required) Background A derivative is a fi

sammuhrai 1 Nov 29, 2021
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
follow-analyzer helps GitHub users analyze their following and followers relationship

follow-analyzer follow-analyzer helps GitHub users analyze their following and followers relationship by providing a report in html format which conta

Yin-Chiuan Chen 2 May 02, 2022
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o

HyperSpy 411 Dec 27, 2022
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
Common bioinformatics database construction

biodb Common bioinformatics database construction 1.taxonomy (Substance classification database) Download the database wget -c https://ftp.ncbi.nlm.ni

sy520 2 Jan 04, 2022
Spectral Analysis in Python

SPECTRUM : Spectral Analysis in Python contributions: Please join https://github.com/cokelaer/spectrum contributors: https://github.com/cokelaer/spect

Thomas Cokelaer 280 Dec 16, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
Very useful and necessary functions that simplify working with data

Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp

Alexander Goldian 2 Dec 02, 2021
Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022
Convert tables stored as images to an usable .csv file

Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for

711 Dec 26, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

Juniper A. Lake 2 Feb 16, 2022
Gaussian processes in TensorFlow

Website | Documentation (release) | Documentation (develop) | Glossary Table of Contents What does GPflow do? Installation Getting Started with GPflow

GPflow 1.7k Jan 06, 2023
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 05, 2023