Supervised domain-agnostic prediction framework for probabilistic modelling

Last update: Oct 23, 2022

Overview

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data points.

The package offers a variety of features and specifically allows for

the implementation of probabilistic prediction strategies in the supervised contexts
comparison of frequentist and Bayesian prediction methods
strategy optimization through hyperparamter tuning and ensemble methods (e.g. bagging)
workflow automation

List of developers and contributors

Documentation

The full documentation is available here.

Installation

Installation is easy using Python's package manager

$ pip install skpro

Contributing & Citation

We welcome contributions to the skpro project. Please read our contribution guide.

If you use skpro in a scientific publication, we would appreciate citations.

Comments

Distributions as return objects
Re-opening the sub-issue opened in #3 and commented upon by @murphyk

Question: should skpro's predict methods return a vector of distribution objects? For example, using the distributions from scipy.stats which implement methods pdf, cdf, mean, var, etc.

Pro:

this would be using an existing, consolidated, and well-supported interface

it might be easier to use

it might be easier to understand

Contra:

mixture types are not supported

l2 norm is not supported (as would be needed for squared/Gneiting loss)

mixed distributions on the reals, especially empirical distributions (weighted sum of deltas) which are returned by Bayesian packages are not supported

vectors of distributions are not supported, alternatively Cartesian products of distributions

this is not the status quo

help wanted
opened by fkiraly 11

documentation: np.mean(y_pred) does not work

I'm following along with this intro example.. However this line fails

(numpy.mean(y_pred) * 2).shape

Error below (seems to be because Distribution objects don't support the mean() function but instead insist on obscurely calling it point!)

np.mean(y_pred)
Traceback (most recent call last):

  File "<ipython-input-38-19819be87ab5>", line 1, in <module>
    np.mean(y_pred)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
    out=out, **kwargs)

  File "/home/kpmurphy/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 75, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)

TypeError: unsupported operand type(s) for +: 'Distribution' and 'Distribution'

opened by murphyk 3

First example: 'utils' not found

The first example in your documentation (DensityBaseline) does not run right on my machine: it throws a 'module not found' exception at the call to 'utils'.

This might be a python version problem (I am using 3.6), so perhaps it's not an error in the normal sense - though I don't see any specification that the package required a particular python version. Apologies if I missed it: in any case, I fixed it by importing matplotlib instead: i.e.

import matplotlib.pyplot as plt plt.scatter(y_test, y_pred)

instead of:

import utils utils.plot_performance(y_test, y_pred)

opened by Thomas-M-H-Hope 2
problem in loading the skpro

It has been 2 days that I am trying to import skpro. But I can not I keep getting this error:

cannot import name 'six' from 'sklearn.externals' (C:\Users\My Book\anaconda3\lib\site-packages\sklearn\externals_init_.py)

opened by honestee 1
(wish)list of probabilistic regressors to implement or to interface
A wishlist for probabilistic regression methods to implement or interface. This is partly copied from the R counterpart https://github.com/mlr-org/mlr3proba/issues/32 . Number of stars at the end is estimated difficulty or time investment.

GLM

[ ] generalized linear model(s) with regression link, e.g., Gaussian *

[ ] generalized linear model(s) with count link, e.g., Poisson *

[ ] heteroscedastic linear regression ***

[ ] Bayesian GLM where conjugate priors are available, e.g., GLM with Gaussian link ***

KRR aka Gaussian process regression

[ ] vanilla kernel ridge regression with fixed kernel parameters and variance *

[ ] kernel ridge regression with MLE for kernel parameters and regularization parameter **

[ ] heteroscedastic KRR or Gaussian processes ***

CDE

[ ] variants of conditional density estimation (Nadaraya-Watson type) **

[ ] reduction to density estimation by binning of input variables, then apply unconditional density estimation **

Tree-based

[ ] probabilistic regression trees **

Neural networks

[ ] interface tensorflow probability - some hard-coded NN architectures **

[ ] generic tensorflow probability interface - some hard-coded NN architectures ***

Bayesian toolboxes

[ ] generic pymc3 interface ***

[ ] generic pyro interface ****

[ ] generic Stan interface ****

[ ] generic JAGS interface ****

[ ] generic BUGS interface ****

[ ] generic Bayesian interface - prior-valued hyperparameters *****

Pipeline elements for target transformation

[ ] distr fixed target transformation **

[ ] distr predictive target calibration **

Composite techniques, reduction to deterministic regression

[ ] stick mean, sd, from a deterministic regressor which already has these as return types into some location/scale distr family (Gaussian, Laplace) *

[ ] use model 1 for the mean, model 2 fit to residuals (squared, absolute, or log), put this in some location/scale distr family (Gaussian, Laplace) **

[ ] upper/lower thresholder for a regression prediction, to use as a pipeline element for a forced lower variance bound **

[ ] generic parameter prediction by elicitation, output being plugged into parameters of a distr object not necessarily scale/location ****

[ ] reduction via bootstrapped sampling of a determinstic regressor **

Ensembling type pipeline elements and compositors

[ ] simple bagging, averaging of pdf/cdf **

[ ] probabilistic boosting ***

[ ] probabilistic stacking ***

baselines

[ ] always predict a Gaussian with mean = training mean, var = training var *

[ ] IMPORTANT as featureless baseline: reduction to distr/density estimation to produce an unconditional probabilistic regressor **

[ ] IMPORTANT as deterministic style baseline: reduction to deterministic regression, mean = prediction by det.regressor, var = training sample var, distr type = Gaussian (or Laplace) **

Other reduction from/to probabilistic regression

[ ] reducing deterministic regression to probabilistic regression - take mean, median or mode **

[ ] reduction(s) to quantile regression, use predictive quantiles to make a distr ***

[ ] reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) **

[ ] reducing interval regression to probabilistic regression - take mean/sd, or take quantile(s) **

[ ] reduction to survival, as the sub-case of no censoring **

[ ] reduction to classification, by binning ***

good first issue
opened by fkiraly 0
skpro-refactoring (version-2)
See below some comments/description of the coming refactoring contents :

Distribution classes refactoring in a more OOD way (see. skpro->distribution)

Losse functions (see. metrics->distribution)

Estimators (see. metrics->distribution)

Some descriptive notebooks (in docs->notebooks) and a full set of unit test (in tests) are also available.
opened by jesellier 24

Releases(v1.0.1-beta)

v1.0.1-beta(Feb 18, 2019)

Documentation improvements and small fixes
Source code(tar.gz)
Source code(zip)
1.0.0b1(Dec 8, 2017)

The first public beta release of skpro!
Source code(tar.gz)
Source code(zip)

Owner

The Alan Turing Institute

The UK's national institute for data science and artificial intelligence.

GitHub Repository https://alan-turing-institute.github.io/skpro/

[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

dispersion-score Official implementation of 3DV 2021 Paper A Dataset-dispersion Perspective on Reconstruction versus Recognition in Single-view 3D Rec

7 May 28, 2022

Implementation of ConvMixer in TensorFlow and Keras

ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on

8 Oct 03, 2022

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

9.2k Jan 02, 2023

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

229 Dec 20, 2022

Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Streamlit Demo: The Udacity Self-driving Car Image Browser This project demonstrates the Udacity self-driving-car dataset and YOLO object detection in

992 Jan 04, 2023

torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022

Convert openmmlab (not only mmdetection) series model to tensorrt

MMDet to TensorRT This project aims to convert the mmdetection model to TensorRT model end2end. Focus on object detection for now. Mask support is exp

4 Dec 17, 2021

A fast MoE impl for PyTorch

An easy-to-use and efficient system to support the Mixture of Experts (MoE) model for PyTorch.

873 Jan 09, 2023

Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Marvis v1.0 Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant. About M.A.R.V.I.S. J.A.R.V.I.S. is a fictional character

1 Dec 29, 2021

An Implementation of SiameseRPN with Feature Pyramid Networks

SiameseRPN with FPN This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the orig

3 Apr 16, 2022

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

A simple API for working with University of California, Irvine (UCI) Machine Learning (ML) repository Table of Contents Introduction About Page of the

223 Dec 05, 2022

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

42 Nov 24, 2022

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation [Project] [Paper] [arXiv] [Home] Official implementation of FastFCN:

815 Dec 29, 2022

Implementation of GGB color space

GGB Color Space This package is implementation of GGB color space from Development of a Robust Algorithm for Detection of Nuclei and Classification of

2 Oct 06, 2021

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 08, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Attentional Focus Modulates Automatic Finger‑tapping Movements

"Attentional Focus Modulates Automatic Finger‑tapping Movements", in Scientific Reports

1 Dec 02, 2021