Supervised domain-agnostic prediction framework for probabilistic modelling

Overview

skpro

PyPI version Build Status License

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data points.

The package offers a variety of features and specifically allows for

  • the implementation of probabilistic prediction strategies in the supervised contexts
  • comparison of frequentist and Bayesian prediction methods
  • strategy optimization through hyperparamter tuning and ensemble methods (e.g. bagging)
  • workflow automation

List of developers and contributors

Documentation

The full documentation is available here.

Installation

Installation is easy using Python's package manager

$ pip install skpro

Contributing & Citation

We welcome contributions to the skpro project. Please read our contribution guide.

If you use skpro in a scientific publication, we would appreciate citations.

Comments
  • Distributions as return objects

    Distributions as return objects

    Re-opening the sub-issue opened in #3 and commented upon by @murphyk

    Question: should skpro's predict methods return a vector of distribution objects? For example, using the distributions from scipy.stats which implement methods pdf, cdf, mean, var, etc.

    Pro:

    • this would be using an existing, consolidated, and well-supported interface
    • it might be easier to use
    • it might be easier to understand

    Contra:

    • mixture types are not supported
    • l2 norm is not supported (as would be needed for squared/Gneiting loss)
    • mixed distributions on the reals, especially empirical distributions (weighted sum of deltas) which are returned by Bayesian packages are not supported
    • vectors of distributions are not supported, alternatively Cartesian products of distributions
    • this is not the status quo
    help wanted 
    opened by fkiraly 11
  • documentation: np.mean(y_pred) does not work

    documentation: np.mean(y_pred) does not work

    I'm following along with this intro example.. However this line fails

    (numpy.mean(y_pred) * 2).shape
    

    Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

    np.mean(y_pred)
    Traceback (most recent call last):
    
      File "<ipython-input-38-19819be87ab5>", line 1, in <module>
        np.mean(y_pred)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
        out=out, **kwargs)
    
      File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
        ret = umr_sum(arr, axis, dtype, out, keepdims)
    
    TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'
    
    opened by murphyk 3
  • First example: 'utils' not found

    First example: 'utils' not found

    The first example in your documentation (DensityBaseline) does not run right on my machine: it throws a 'module not found' exception at the call to 'utils'.

    This might be a python version problem (I am using 3.6), so perhaps it's not an error in the normal sense - though I don't see any specification that the package required a particular python version. Apologies if I missed it: in any case, I fixed it by importing matplotlib instead: i.e.

    import matplotlib.pyplot as plt plt.scatter(y_test, y_pred)

    instead of:

    import utils utils.plot_performance(y_test, y_pred)

    opened by Thomas-M-H-Hope 2
  • problem in loading the skpro

    problem in loading the skpro

    It has been 2 days that I am trying to import skpro. But I can not I keep getting this error:

    cannot import name 'six' from 'sklearn.externals' (C:\Users\My Book\anaconda3\lib\site-packages\sklearn\externals_init_.py)

    opened by honestee 1
  • (wish)list of probabilistic regressors to implement or to interface

    (wish)list of probabilistic regressors to implement or to interface

    A wishlist for probabilistic regression methods to implement or interface. This is partly copied from the R counterpart https://github.com/mlr-org/mlr3proba/issues/32 . Number of stars at the end is estimated difficulty or time investment.

    GLM

    • [ ] generalized linear model(s) with regression link, e.g., Gaussian *
    • [ ] generalized linear model(s) with count link, e.g., Poisson *
    • [ ] heteroscedastic linear regression ***
    • [ ] Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

    KRR aka Gaussian process regression

    • [ ] vanilla kernel ridge regression with fixed kernel parameters and variance *
    • [ ] kernel ridge regression with MLE for kernel parameters and regularization parameter **
    • [ ] heteroscedastic KRR or Gaussian processes ***

    CDE

    • [ ] variants of conditional density estimation (Nadaraya-Watson type) **
    • [ ] reduction to density estimation by binning of input variables, then apply unconditional density estimation **

    Tree-based

    • [ ] probabilistic regression trees **

    Neural networks

    • [ ] interface tensorflow probability - some hard-coded NN architectures **
    • [ ] generic tensorflow probability interface - some hard-coded NN architectures ***

    Bayesian toolboxes

    • [ ] generic pymc3 interface ***
    • [ ] generic pyro interface ****
    • [ ] generic Stan interface ****
    • [ ] generic JAGS interface ****
    • [ ] generic BUGS interface ****
    • [ ] generic Bayesian interface - prior-valued hyperparameters *****

    Pipeline elements for target transformation

    • [ ] distr fixed target transformation **
    • [ ] distr predictive target calibration **

    Composite techniques, reduction to deterministic regression

    • [ ] stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *
    • [ ] use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **
    • [ ] upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **
    • [ ] generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****
    • [ ] reduction via bootstrapped sampling of a determinstic regressor **

    Ensembling type pipeline elements and compositors

    • [ ] simple bagging, averaging of pdf/cdf **
    • [ ] probabilistic boosting ***
    • [ ] probabilistic stacking ***

    baselines

    • [ ] always predict a Gaussian with mean = training mean, var = training var *
    • [ ] IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **
    • [ ] IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

    Other reduction from/to probabilistic regression

    • [ ] reducing deterministic regression to probabilistic regression - take mean, median or mode **
    • [ ] reduction(s) to quantile regression, use predictive quantiles to make a distr ***
    • [ ] reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **
    • [ ] reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **
    • [ ] reduction to survival, as the sub-case of no censoring **
    • [ ] reduction to classification, by binning ***
    good first issue 
    opened by fkiraly 0
  • skpro-refactoring (version-2)

    skpro-refactoring (version-2)

    See below some comments/description of the coming refactoring contents :

    • Distribution classes refactoring in a more OOD way (see. skpro->distribution)
    • Losse functions (see. metrics->distribution)
    • Estimators (see. metrics->distribution)

    Some descriptive notebooks (in docs->notebooks) and a full set of unit test (in tests) are also available.

    opened by jesellier 24
Releases(v1.0.1-beta)
Owner
The Alan Turing Institute
The UK's national institute for data science and artificial intelligence.
The Alan Turing Institute
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

RainNet — Official Pytorch Implementation Region-aware Adaptive Instance Normalization for Image Harmonization Jun Ling, Han Xue, Li Song*, Rong Xie,

130 Dec 11, 2022
A practical ML pipeline for data labeling with experiment tracking using DVC.

Auto Label Pipeline A practical ML pipeline for data labeling with experiment tracking using DVC Goals: Demonstrate reproducible ML Use DVC to build a

Todd Cook 4 Mar 08, 2022
Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

Sharath Raparthy 1 Dec 19, 2021
An experimental technique for efficiently exploring neural architectures.

SMASH: One-Shot Model Architecture Search through HyperNetworks An experimental technique for efficiently exploring neural architectures. This reposit

Andy Brock 478 Aug 04, 2022
MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network This repository is the official implementation of MatchGAN: A S

Justin Sun 12 Dec 27, 2022
Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Introduction Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models". In this work, we demonstrate that existi

Wei-Cheng Tseng 7 Nov 01, 2022
Mengzi Pretrained Models

中文 | English Mengzi 尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。 我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

Langboat 424 Jan 04, 2023
A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

TaichiSLAM This project is a 3D Dense mapping backend library of SLAM based Taichi-Lang, designed for the aerial swarm. Intro Taichi is an efficient d

XuHao 230 Dec 19, 2022
An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Raspberry Pi Air Quality Monitor A simple air quality monitoring service for the Raspberry Pi. Installation Clone the repository and run the following

rydercalmdown 24 Dec 09, 2022
A Tensorflow implementation of BicycleGAN.

BicycleGAN implementation in Tensorflow As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometim

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 97 Dec 02, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
JAX-based neural network library

Haiku: Sonnet for JAX Overview | Why Haiku? | Quickstart | Installation | Examples | User manual | Documentation | Citing Haiku What is Haiku? Haiku i

DeepMind 2.3k Jan 04, 2023
You Only Look Once for Panopitic Driving Perception

You Only 👀 Once for Panoptic 🚗 Perception You Only Look at Once for Panoptic driving Perception by Dong Wu, Manwen Liao, Weitian Zhang, Xinggang Wan

Hust Visual Learning Team 1.4k Jan 04, 2023
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 08, 2023
Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

S2VD Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021) Requirements and Dependencies Ubuntu 16.04, cuda 10.0 Python 3.6.10, P

Zongsheng Yue 53 Nov 23, 2022
Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks.

Luminous is a framework for testing the performance of Embodied AI (EAI) models in indoor tasks. Generally, we intergrete different kind of functional

28 Jan 08, 2023
Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

DeT and DOT Code and datasets for "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021) "Depth-only Object Tracking" (BMVC2021) @InProceedings

Yan Song 55 Dec 15, 2022
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022