Bonsai: Gradient Boosted Trees + Bayesian Optimization

Overview

Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

Despite being a very small package, it has access to nearly all of the configurable parameters in XGBoost and CatBoost as well as the BayesianOptimization package allowing users to specify unique objectives, metrics, parameter search ranges, and search policies. This is made possible thanks to the strong similarities between both libraries.

$ pip install bonsai-tree

References/Dependencies:

Why use Bonsai?

Grid search and random search are the most commonly used algorithms for exploring the hyperparameter space for a wide range of machine learning models. While effective for optimizing over low dimensional hyperparameter spaces (ex: few regularization terms), these methods do not scale well to models with a large number of hyperparameters such as gradient boosted trees.

Bayesian optimization on the other hand dynamically samples from the hyperparameter space with the goal of minimizing uncertaintly about the underlying objective function. For the case of model optimization, this consists of iteratively building a prior distribution of functions over the hyperparameter space and sampling with the goal of minimizing the posterior variance of the loss surface (via Gaussian Processes).

Model Configuration

Since Bonsai is simply a wrapper for both XGBoost and CatBoost, the model_params dict is synonymous with the params argument for both catboost.fit() and xgboost.fit(). Additionally, you must encode your categorical features as usual depending on which library you are using (XGB: One-Hot, CB: Label).

Below is a simple example of binary classification using CatBoost:

# label encoded training data
X = train.drop(target, axis = 1)
y = train[target]

# same args as catboost.train(...)
model_params = dict(objective = 'Logloss', verbose = False)

# same args as catboost.cv(...)
cv_params = dict(nfold = 5)

The pbounds dict as seen below specifies the hyperparameter bounds over which the optimizer will search. Additionally, the opt_config dictionary is for configuring the optimizer itself. Refer to the BayesianOptimization documentation to learn more.

# defining parameter search ranges
pbounds = dict(
  eta = (0.15, 0.4), 
  n_estimators = (200,2000), 
  max_depth = (4, 8)
)

# 10 warm up samples + 10 optimizing steps
n_iter, init_points= 10, 10

# to learn more about customizing your search policy:
# BayesianOptimization/examples/exploitation_vs_exploration.ipynb
opt_config = dict(acq = 'ei', xi = 1e-2)

Tuning and Prediction

All that is left is to initialize and optimize.

from bonsai.tune import CB_Tuner

# note that 'cats' is a list of categorical feature names
tuner = CB_Tuner(X, y, cats, model_params, cv_params, pbounds)
tuner.optimize(n_iter, init_points, opt_config, bounds_transformer)

After the optimal parameters are found, the model is trained and stored internally giving full access to the CatBoost model.

test_pool = catboost.Pool(test, cat_features = cats)
preds = tuner.model.predict(test_pool, prediction_type = 'Probability')

Bonsai also comes with a parallel coordinates plotting functionality allowing users to further narrow down their parameter search ranges as needed.

from bonsai.utils import parallel_coordinates

# DataFrame with hyperparams and observed loss
results = tuner.opt_results
parallel_coordinates(results)

Owner
Landon Buechner
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 08, 2023
Nevergrad - A gradient-free optimization platform

Nevergrad - A gradient-free optimization platform nevergrad is a Python 3.6+ library. It can be installed with: pip install nevergrad More installati

Meta Research 3.4k Jan 08, 2023
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Felix Daudi 1 Jan 06, 2022
This repository contains the code to predict house price using Linear Regression Method

House-Price-Prediction-Using-Linear-Regression The dataset I used for this personal project is from Kaggle uploaded by aariyan panchal. Link of Datase

0 Jan 28, 2022
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022
Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

Ravi Chaubey 6 Oct 20, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is a salary prediction website developed with the help of machine learning, this makes prediction of salary on basis of few parameters like interview score, experience test score.

1 Feb 10, 2022
The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries. Contents Overview In a Nutshell Where Next? Overv

Ivy 8.2k Dec 31, 2022
Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft 241 Dec 26, 2022
pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks l

Ibotta 84 Dec 29, 2022
Dragonfly is an open source python library for scalable Bayesian optimisation.

Dragonfly is an open source python library for scalable Bayesian optimisation. Bayesian optimisation is used for optimising black-box functions whose

744 Jan 02, 2023
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022
using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

Mohammed Hussien 2 May 02, 2022