A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Last update: Dec 29, 2022

Overview

Documentation | External Resources | Research Paper

Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble.

The library consists of various methods to compute (approximate) the Shapley value of players (models) in weighted voting games (ensemble games) - a class of transferable utility cooperative games. We covered the exact enumeration based computation and various widely know approximation methods from economics and computer science research papers. There are also functionalities to identify the heterogeneity of the player pool based on the Shapley entropy. In addition, the framework comes with a detailed documentation, an intuitive tutorial, 100% test coverage and illustrative toy examples.

Citing

If you find Shapley useful in your research please consider adding the following citation:

@misc{rozemberczki2021shapley,
      title = {{The Shapley Value of Classifiers in Ensemble Games}}, 
      author = {Benedek Rozemberczki and Rik Sarkar},
      year = {2021},
      eprint = {2101.02153},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
}

A simple example

Shapley makes solving voting games quite easy - see the accompanying tutorial. For example, this is all it takes to solve a weighted voting game with defined on the fly with permutation sampling:

import numpy as np
from shapley import PermutationSampler

W = np.random.uniform(0, 1, (1, 7))
W = W/W.sum()
q = 0.5

solver = PermutationSampler()
solver.solve_game(W, q)
shapley_values = solver.get_solution()

Methods Included

In detail, the following methods can be used.

Expected Marginal Contribution Approximation from Fatima et al.: A Linear Approximation Method for the Shapley Value
Multilinear Extension from Owen: Multilinear Extensions of Games
Monte Carlo Permutation Sampling from Maleki et al.: Bounding the Estimation Error of Sampling-based Shapley Value Approximation
Exact Enumeration from Shapley: A Value for N-Person Games

Head over to our documentation to find out more about installation, creation of datasets and a full list of implemented methods and available datasets. For a quick start, check out the examples in the examples/ directory.

If you notice anything unexpected, please open an issue. If you are missing a specific method, feel free to open a feature request.

Installation

$ pip install shapley

Running tests

$ python setup.py test

Running examples

$ cd examples
$ python permutation_sampler_example.py

License

MIT License

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 1, 2023

Comments

Error in running MLE example

Thank you for sharing your great work. I truly enjoyed reading it. However, I met an error when I tried the example. It seems to be fine for the MC example.

$ python multilinear_extension_example.py RuntimeWarning: invalid value encountered in true_divide self._Phi = self._Phi / np.sum(self._Phi, axis=1).reshape(-1, 1) Traceback (most recent call last): File "multilinear_extension_example.py", line 11, in solver.solve_game(W, q) File "/lib/python3.6/site-packages/shapley/solvers/multilinear_extension.py", line 34, in solve_game self._run_sanity_check(W, self._Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 28, in _run_sanity_check self._verify_distribution(Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 22, in _verify_distribution assert np.sum(Phi) - Phi.shape[0] < 0.001 AssertionError

opened by xxlya 2

Releases(v_10003)

v_10003(Apr 28, 2022)
Moves the Shapley library to an ABC based design.

Adds a version attribute.

Source code(tar.gz)
Source code(zip)
v_10002(May 16, 2021)

Source code(tar.gz)
Source code(zip)
v_10001(Feb 1, 2021)
Fixed the expectations and variances.

Source code(tar.gz)
Source code(zip)
v_10000(Dec 31, 2020)

The official first release of Shapley.
Source code(tar.gz)
Source code(zip)

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Related tags

Overview

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

The Python ensemble sampling toolkit for affine-invariant MCMC

Neural Ensemble Search for Performant and Calibrated Predictions

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Using Hotel Data to predict High Value And Potential VIP Guests

A Simple Key-Value Data-store written in Python

Comments

Error in running MLE example

Releases(v_10003)

v_10003(Apr 28, 2022)

v_10002(May 16, 2021)

v_10001(Feb 1, 2021)

v_10000(Dec 31, 2020)

Owner

Benedek Rozemberczki

Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

OMLT: Optimization and Machine Learning Toolkit

Robust Consistent Video Depth Estimation

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)

This is the repository for Learning to Generate Piano Music With Sustain Pedals

Torch implementation of various types of GAN (e.g. DCGAN, ALI, Context-encoder, DiscoGAN, CycleGAN, EBGAN, LSGAN)

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

competitions-v2

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank

Neon-erc20-example - Example of creating SPL token and wrapping it with ERC20 interface in Neon EVM

One-line your code easily but still with the fun of doing so!

Vector Neurons: A General Framework for SO(3)-Equivariant Networks

DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

Explainable Zero-Shot Topic Extraction