Probabilistic Gradient Boosting Machines

Last update: Dec 28, 2022

Related tags

Overview

PGBM

Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Airlab in Amsterdam. It provides the following advantages over existing frameworks:

Probabilistic regression estimates instead of only point estimates. (example)
Auto-differentiation of custom loss functions. (example, example)
Native (multi-)GPU-acceleration. (example, example)
Ability to optimize probabilistic estimates after training for a set of common distributions, without retraining the model. (example)

It is aimed at users interested in solving large-scale tabular probabilistic regression problems, such as probabilistic time series forecasting. For more details, read our paper or check out the examples.

Installation

Run pip install pgbm from a terminal within a Python (virtual) environment of your choice.

Verification

Download & run an example from the examples folder to verify the installation is correct:
- Run this example to verify ability to train & predict on CPU with Torch backend.
- Run this example to verify ability to train & predict on GPU with Torch backend.
- Run this example to verify ability to train & predict on CPU with Numba backend.
Note that when training on the GPU, the custom CUDA kernel will be JIT-compiled when initializing a model. Hence, the first time you train a model on the GPU it can take a bit longer, as PGBM needs to compile the CUDA kernel.
When using the Numba-backend, several functions need to be JIT-compiled. Hence, the first time you train a model using this backend it can take a bit longer.
To run the examples some additional packages such as scikit-learn or matplotlib are required; these should be installed separately via pip or conda.

Dependencies

The core package has the following dependencies which should be installed separately (installing the core package via pip will not automatically install these dependencies).

Torch backend

CUDA Toolkit matching your PyTorch distribution (https://developer.nvidia.com/cuda-toolkit)
PyTorch >= 1.7.0, with CUDA 11.0 for GPU acceleration (https://pytorch.org/get-started/locally/). Verify that PyTorch can find a cuda device on your machine by checking whether torch.cuda.is_available() returns True after installing PyTorch.
PGBM uses a custom CUDA kernel which needs to be compiled, which may require installing a suitable compiler. Installing PyTorch and the full CUDA Toolkit should be sufficient, but open an issue if you find it still not working even after installing these dependencies.

Numba backend

Numba >= 0.53.1 (https://numba.readthedocs.io/en/stable/user/installing.html).

The Numba backend does not support differentiable loss functions and GPU training is also not supported using this backend.

Support

See the examples folder for examples, an overview of hyperparameters and a function reference. In general, PGBM works similar to existing gradient boosting packages such as LightGBM or xgboost (and it should be possible to more or less use it as a drop-in replacement), except that it is required to explicitly define a loss function and loss metric.

In case further support is required, open an issue.

Reference

Olivier Sprangers, Sebastian Schelter, Maarten de Rijke. Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21), August 14–18, 2021, Virtual Event, Singapore.

The experiments from our paper can be replicated by running the scripts in the experiments folder. Datasets are downloaded when needed in the experiments except for higgs and m5, which should be pre-downloaded and saved to the datasets folder (Higgs) and to datasets/m5 (m5).

License

This project is licensed under the terms of the Apache 2.0 license.

Acknowledgements

This project was developed by Airlab Amsterdam.

Probabilistic Gradient Boosting Machines

Related tags

Overview

PGBM

Installation

Verification

Dependencies

Torch backend

Numba backend

Support

Reference

License

Acknowledgements

Owner

Olivier Sprangers

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

3D Pose Estimation for Vehicles

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

This is the official implementation of our proposed SwinMR

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Official Implementation of DE-CondDETR and DELA-CondDETR in "Towards Data-Efficient Detection Transformers"

This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Code & Data for Enhancing Photorealism Enhancement

Self-Supervised Learning for Domain Adaptation on Point-Clouds

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

[UNMAINTAINED] Automated machine learning for analytics & production