InfiniteBoost: building infinite ensembles with gradient descent

Last update: Jan 03, 2023

Overview

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

InfiniteBoost: building infinite ensembles with gradient descent

Related tags

Overview

InfiniteBoost

Description

Reproducing research

InfiniteBoost package

Owner

Alex Rogozhnikov

Coursera Machine Learning - Python code

Dragonfly is an open source python library for scalable Bayesian optimisation.

Real-time stream processing for python

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

Price forecasting of SGB and IRFC Bonds and comparing there returns

SynapseML - an open source library to simplify the creation of scalable machine learning pipelines

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Diabetes Prediction with Logistic Regression

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Data Efficient Decision Making

Metric learning algorithms in Python

TIANCHI Purchase Redemption Forecast Challenge

A complete guide to start and improve in machine learning (ML)

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Code base of KU AIRS: SPARK Autonomous Vehicle Team

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

slim-python is a package to learn customized scoring systems for decision-making problems.

LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

Machine learning algorithms implementation

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断