SigOpt wrappers for scikit-learn methods

Overview

SigOpt + scikit-learn Interfacing

Build Status

This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together

Getting Started

Install the sigopt_sklearn python modules with pip install sigopt_sklearn.

Sign up for an account at https://sigopt.com. To use the interfaces, you'll need your API token from the API tokens page.

SigOptSearchCV

The simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross validation. A short example that tunes the parameters of an SVM on a small dataset is provided below

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'

iris = datasets.load_iris()

# define parameter domains
svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

# define sklearn estimator
svr = svm.SVC()

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5,
    client_token=client_token, n_jobs=5, n_iter=20)

# perform CV search for best parameters and fits estimator
# on all data using best found configuration
clf.fit(iris.data, iris.target)

# clf.predict() now uses best found estimator
# clf.best_score_ contains CV score for best found estimator
# clf.best_params_ contains best found param configuration

The objective optimized by default is is the default score associated with an estimator. A custom objective can be used by passing the scoring option to the SigOptSearchCV constructor. Shown below is an example that uses the f1_score already implemented in sklearn

from sklearn.metrics import f1_score, make_scorer
f1_scorer = make_scorer(f1_score)

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5, scoring=f1_scorer,
    client_token=client_token, n_jobs=5, n_iter=50)

# perform CV search for best parameters
clf.fit(X, y)

XGBoostClassifier

SigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface

import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
iris = datasets.load_iris()

xgb_params = {
  'learning_rate': (0.01, 0.5),
  'n_estimators': (10, 50),
  'max_depth': (3, 10),
  'min_child_weight': (6, 12),
  'gamma': (0, 0.5),
  'subsample': (0.6, 1.0),
  'colsample_bytree': (0.6, 1.)
}

xgbc = XGBClassifier()

clf = SigOptSearchCV(xgbc, xgb_params, cv=5,
    client_token=client_token, n_jobs=5, n_iter=70, verbose=1)

clf.fit(iris.data, iris.target)

SigOptEnsembleClassifier

This class concurrently trains and tunes several classification models within sklearn to facilitate model selection efforts when investigating new datasets.

You'll need to install the sigopt_sklearn library with the extra requirements of xgboost for this aspect of the library to work:

pip install sigopt_sklearn[ensemble]

A short example, using an activity recognition dataset is provided below We also have a video tutorial outlining how to run this example here:

SigOpt scikit-learn Tutorial

# Human Activity Recognition Using Smartphone
# https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
unzip UCI\ HAR\ Dataset.zip
cd UCI\ HAR\ Dataset
import numpy as np
import pandas as pd
from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

def load_datafile(filename):
  X = []
  with open(filename, 'r') as f:
    for l in f:
      X.append(np.array([float(v) for v in l.split()]))
  X = np.vstack(X)
  return X

X_train = load_datafile('train/X_train.txt')
y_train = load_datafile('train/y_train.txt').ravel()
X_test = load_datafile('test/X_test.txt')
y_test = load_datafile('test/y_test.txt').ravel()

# fit and tune several classification models concurrently
# find your SigOpt client token here : https://sigopt.com/tokens
sigopt_clf = SigOptEnsembleClassifier()
sigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),
    client_token='<YOUR_CLIENT_TOKEN>')

# compare model performance on hold out set
ensemble_train_scores = [est.score(X_train,y_train) for est in sigopt_clf.estimator_ensemble]
ensemble_test_scores = [est.score(X_test,y_test) for est in sigopt_clf.estimator_ensemble]
data = sorted(zip([est.__class__.__name__
                        for est in sigopt_clf.estimator_ensemble], ensemble_train_scores, ensemble_test_scores),
                        reverse=True, key=lambda x: (x[2], x[1]))
pd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])

CV Fold Timeouts

SigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master branch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are too slow.

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
dataset = datasets.fetch_20newsgroups_vectorized()
X = dataset.data
y = dataset.target

# define parameter domains
svc_parameters  = {
  'kernel': ['linear', 'rbf'],
  'C': (0.5, 100),
  'max_iter': (10, 200),
  'tol': (1e-2, 1e-6)
}
svr = svm.SVC()

# SVM fitting can be quite slow, so we set timeout = 180 seconds
# for each fit.  SigOpt will then avoid configurations that are too slow
clf = SigOptSearchCV(svr, svc_parameters, cv=5, opt_timeout=180,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)

Categoricals

SigOptSearchCV supports categorical parameters specified as list of string as the kernel parameter is in the SVM example:

svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

SigOpt also supports non-string valued categorical parameters. For example the hidden_layer_sizes parameter in the MLPRegressor example below,

parameters = {
  'activation': ['relu', 'tanh', 'logistic'],
  'solver': ['lbfgs', 'adam'],
  'alpha': (0.0001, 0.01),
  'learning_rate_init': (0.001, 0.1),
  'power_t': (0.001, 1.0),
  'beta_1': (0.8, 0.999),
  'momentum': (0.001, 1.0),
  'beta_2': (0.8, 0.999),
  'epsilon': (0.00000001, 0.0001),
  'hidden_layer_sizes': {
    'shallow': (100,),
    'medium': (10, 10),
    'deep': (10, 10, 10, 10)
  }
}
nn = MLPRegressor()
clf = SigOptSearchCV(nn, parameters, cv=5, cv_timeout=240,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)
Owner
SigOpt
SigOpt
Contains source code for the winning solution of the xView3 challenge

Winning Solution for xView3 Challenge This repository contains source code and pretrained models for my (Eugene Khvedchenya) solution to xView 3 Chall

Eugene Khvedchenya 51 Dec 30, 2022
A collection of resources, problems, explanations and concepts that are/were important during my Data Science journey

Data Science Gurukul List of resources, interview questions, concepts I use for my Data Science work. Topics: Basics of Programming with Python + Unde

Smaranjit Ghose 10 Oct 25, 2022
Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

CenterGroup This the official implementation of our ICCV 2021 paper The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person P

Dynamic Vision and Learning Group 43 Dec 25, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

Explainable_FIQA_WITH_AMVA Note This is the official repository of the paper: Explainability of the Implications of Supervised and Unsupervised Face I

3 May 08, 2022
Pytorch implementation of AREL

Status: Archive (code is provided as-is, no updates expected) Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement

8 Nov 25, 2022
Manifold Alignment for Semantically Aligned Style Transfer

Manifold Alignment for Semantically Aligned Style Transfer [Paper] Getting Started MAST has been tested on CentOS 7.6 with python = 3.6. It supports

35 Nov 14, 2022
Perspective: Julia for Biologists

Perspective: Julia for Biologists 1. Examples Speed: Example 1 - Single cell data and network inference Domain: Single cell data Methodology: Network

Elisabeth Roesch 55 Dec 02, 2022
AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

Shreyas Srivastava 1 Oct 25, 2021
DROPO: Sim-to-Real Transfer with Offline Domain Randomization

DROPO: Sim-to-Real Transfer with Offline Domain Randomization Gabriele Tiboni, Karol Arndt, Ville Kyrki. This repository contains the code for the pap

Gabriele Tiboni 8 Dec 19, 2022
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
Fast and accurate optimisation for registration with little learningconvexadam

convexAdam Learn2Reg 2021 Submission Fast and accurate optimisation for registration with little learning Excellent results on Learn2Reg 2021 challeng

17 Dec 06, 2022
Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Neural Descriptor Fields (NDF) PyTorch implementation for training continuous 3D neural fields to represent dense correspondence across objects, and u

167 Jan 06, 2023
Implementation for Homogeneous Unbalanced Regularized Optimal Transport

HUROT: An Homogeneous formulation of Unbalanced Regularized Optimal Transport. This repository provides code related to this preprint. This is an alph

Théo Lacombe 1 Feb 17, 2022
Python inverse kinematics for your robot model based on Pinocchio.

Python inverse kinematics for your robot model based on Pinocchio.

Stéphane Caron 50 Dec 22, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise

45 Dec 08, 2022
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
Visual dialog agents with pre-trained vision-and-language encoders.

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Or READ-UP: Referring Expression Agent Dialog with Unified Pretr

7 Oct 08, 2022
Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

FermiFlow 9 Mar 03, 2022
Graph Convolutional Networks in PyTorch

Graph Convolutional Networks in PyTorch PyTorch implementation of Graph Convolutional Networks (GCNs) for semi-supervised classification [1]. For a hi

Thomas Kipf 4.5k Dec 31, 2022