aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Overview

Bayesian Methods for Hackers

Using Python and PyMC

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

If Bayesian inference is the destination, then mathematical analysis is a particular path towards it. On the other hand, computing power is cheap enough that we can afford to take an alternate route via probabilistic programming. The latter path is much more useful, as it denies the necessity of mathematical intervention at each step, that is, we remove often-intractable mathematical analysis as a prerequisite to Bayesian inference. Simply put, this latter computational path proceeds via small intermediate jumps from beginning to end, where as the first path proceeds by enormous leaps, often landing far away from our target. Furthermore, without a strong mathematical background, the analysis required by the first path cannot even take place.

Bayesian Methods for Hackers is designed as an introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

The choice of PyMC as the probabilistic programming language is two-fold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe. The official documentation assumes prior knowledge of Bayesian inference and probabilistic programming. We hope this book encourages users at every level to look at PyMC. Secondly, with recent core developments and popularity of the scientific stack in Python, PyMC is likely to become a core component soon enough.

PyMC does have dependencies to run, namely NumPy and (optionally) SciPy. To not limit the user, the examples in this book will rely only on PyMC, NumPy, SciPy and Matplotlib.

Printed Version by Addison-Wesley

Bayesian Methods for Hackers is now available as a printed book! You can pick up a copy on Amazon. What are the differences between the online version and the printed version?

  • Additional Chapter on Bayesian A/B testing
  • Updated examples
  • Answers to the end of chapter questions
  • Additional explanation, and rewritten sections to aid the reader.

Contents

See the project homepage here for examples, too.

The below chapters are rendered via the nbviewer at nbviewer.jupyter.org/, and is read-only and rendered in real-time. Interactive notebooks + examples can be downloaded by cloning!

PyMC2

  • Prologue: Why we do it.

  • Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:

    • Inferring human behaviour changes from text message rates
  • Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:

    • Detecting the frequency of cheating students, while avoiding liars
    • Calculating probabilities of the Challenger space-shuttle disaster
  • Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:

    • Bayesian clustering with mixture models
  • Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:

    • Exploring a Kaggle dataset and the pitfalls of naive analysis
    • How to sort Reddit comments from best to worst (not as easy as you think)
  • Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:

    • Solving the Price is Right's Showdown
    • Optimizing financial predictions
    • Winning solution to the Kaggle Dark World's competition
  • Chapter 6: Getting our prior-ities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:

    • Multi-Armed Bandits and the Bayesian Bandit solution.
    • What is the relationship between data sample size and prior?
    • Estimating financial unknowns using expert priors

    We explore useful tips to be objective in analysis as well as common pitfalls of priors.

PyMC3

  • Prologue: Why we do it.

  • Chapter 1: Introduction to Bayesian Methods Introduction to the philosophy and practice of Bayesian methods and answering the question, "What is probabilistic programming?" Examples include:

    • Inferring human behaviour changes from text message rates
  • Chapter 2: A little more on PyMC We explore modeling Bayesian problems using Python's PyMC library through examples. How do we create Bayesian models? Examples include:

    • Detecting the frequency of cheating students, while avoiding liars
    • Calculating probabilities of the Challenger space-shuttle disaster
  • Chapter 3: Opening the Black Box of MCMC We discuss how MCMC operates and diagnostic tools. Examples include:

    • Bayesian clustering with mixture models
  • Chapter 4: The Greatest Theorem Never Told We explore an incredibly useful, and dangerous, theorem: The Law of Large Numbers. Examples include:

    • Exploring a Kaggle dataset and the pitfalls of naive analysis
    • How to sort Reddit comments from best to worst (not as easy as you think)
  • Chapter 5: Would you rather lose an arm or a leg? The introduction of loss functions and their (awesome) use in Bayesian methods. Examples include:

    • Solving the Price is Right's Showdown
    • Optimizing financial predictions
    • Winning solution to the Kaggle Dark World's competition
  • Chapter 6: Getting our prior-ities straight Probably the most important chapter. We draw on expert opinions to answer questions. Examples include:

    • Multi-Armed Bandits and the Bayesian Bandit solution.
    • What is the relationship between data sample size and prior?
    • Estimating financial unknowns using expert priors

    We explore useful tips to be objective in analysis as well as common pitfalls of priors.

More questions about PyMC? Please post your modeling, convergence, or any other PyMC question on cross-validated, the statistics stack-exchange.

Using the book

The book can be read in three different ways, starting from most recommended to least recommended:

  1. The most recommended option is to clone the repository to download the .ipynb files to your local machine. If you have Jupyter installed, you can view the chapters in your browser plus edit and run the code provided (and try some practice questions). This is the preferred option to read this book, though it comes with some dependencies.

    • Jupyter is a requirement to view the ipynb files. It can be downloaded here. Jupyter notebooks can be run by (your-virtualenv) ~/path/to/the/book/Chapter1_Introduction $ jupyter notebook
    • For Linux users, you should not have a problem installing NumPy, SciPy, Matplotlib and PyMC. For Windows users, check out pre-compiled versions if you have difficulty.
    • In the styles/ directory are a number of files (.matplotlirc) that used to make things pretty. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib.
  2. The second, preferred, option is to use the nbviewer.jupyter.org site, which display Jupyter notebooks in the browser (example). The contents are updated synchronously as commits are made to the book. You can use the Contents section above to link to the chapters.

  3. PDFs are the least-preferred method to read the book, as PDFs are static and non-interactive. If PDFs are desired, they can be created dynamically using the nbconvert utility.

Installation and configuration

If you would like to run the Jupyter notebooks locally, (option 1. above), you'll need to install the following:

  • Jupyter is a requirement to view the ipynb files. It can be downloaded here

  • Necessary packages are PyMC, NumPy, SciPy and Matplotlib.

  • New to Python or Jupyter, and help with the namespaces? Check out this answer.

  • In the styles/ directory are a number of files that are customized for the notebook. These are not only designed for the book, but they offer many improvements over the default settings of matplotlib and the Jupyter notebook. The in notebook style has not been finalized yet.

Development

This book has an unusual development design. The content is open-sourced, meaning anyone can be an author. Authors submit content or revisions using the GitHub interface.

How to contribute

What to contribute?

  • The current chapter list is not finalized. If you see something that is missing (MCMC, MAP, Bayesian networks, good prior choices, Potential classes etc.), feel free to start there.
  • Cleaning up Python code and making code more PyMC-esque
  • Giving better explanations
  • Spelling/grammar mistakes
  • Suggestions
  • Contributing to the Jupyter notebook styles

Commiting

  • All commits are welcome, even if they are minor ;)
  • If you are unfamiliar with Github, you can email me contributions to the email below.

Reviews

these are satirical, but real

"No, but it looks good" - John D. Cook

"I ... read this book ... I like it!" - Andrew Gelman

"This book is a godsend, and a direct refutation to that 'hmph! you don't know maths, piss off!' school of thought... The publishing model is so unusual. Not only is it open source but it relies on pull requests from anyone in order to progress the book. This is ingenious and heartening" - excited Reddit user

Contributions and Thanks

Thanks to all our contributing authors, including (in chronological order):

Authors
Cameron Davidson-Pilon Stef Gibson Vincent Ohprecio Lars Buitinck
Paul Magwene Matthias Bussonnier Jens Rantil y-p
Ethan Brown Jonathan Whitmore Mattia Rigotti Colby Lemon
Gustav W Delius Matthew Conlen Jim Radford Vannessa Sabino
Thomas Bratt Nisan Haramati Robert Grant Matthew Wampler-Doty
Yaroslav Halchenko Alex Garel Oleksandr Lysenko liori
ducky427 Pablo de Oliveira Castro sergeyfogelson Mattia Rigotti
Matt Bauman Andrew Duberstein Carsten Brandt Bob Jansen
ugurthemaster William Scott Min RK Bulwersator
elpres Augusto Hack Michael Feldmann Youki
Jens Rantil Kyle Meyer Eric Martin Inconditus
Kleptine Stuart Layton Antonino Ingargiola vsl9
Tom Christie bclow Simon Potter Garth Snyder
Daniel Beauchamp Philipp Singer gbenmartin Peadar Coyle

We would like to thank the Python community for building an amazing architecture. We would like to thank the statistics community for building an amazing architecture.

Similarly, the book is only possible because of the PyMC library. A big thanks to the core devs of PyMC: Chris Fonnesbeck, Anand Patil, David Huard and John Salvatier.

One final thanks. This book was generated by Jupyter Notebook, a wonderful tool for developing in Python. We thank the IPython/Jupyter community for developing the Notebook interface. All Jupyter notebook files are available for download on the GitHub repository.

Contact

Contact the main author, Cam Davidson-Pilon at [email protected] or @cmrndp

Imgur

Owner
Cameron Davidson-Pilon
Focusing on sustainable food. Former Director of Data Science @Shopify. Author of Bayesian Methods for Hackers and DataOrigami.
Cameron Davidson-Pilon
Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

1 Oct 11, 2021
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
PaddleBoBo是基于PaddlePaddle和PaddleSpeech、PaddleGAN等开发套件的虚拟主播快速生成项目

PaddleBoBo - 元宇宙时代,你也可以动手做一个虚拟主播。 PaddleBoBo是基于飞桨PaddlePaddle深度学习框架和PaddleSpeech、PaddleGAN等开发套件的虚拟主播快速生成项目。PaddleBoBo致力于简单高效、可复用性强,只需要一张带人像的图片和一段文字,就能

502 Jan 08, 2023
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

168 Dec 24, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch = 1.0 torchvision = 0.2.0 Python 3 Environm

15 Apr 04, 2022
PINN Burgers - 1D Burgers equation simulated by PINN

PINN(s): Physics-Informed Neural Network(s) for Burgers equation This is an impl

ShotaDEGUCHI 1 Feb 12, 2022
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack Case study of the FCA. The code can be find in FCA. Cas

IDRL 21 Dec 15, 2022
[NeurIPS'20] Multiscale Deep Equilibrium Models

Multiscale Deep Equilibrium Models 💥 💥 💥 💥 This repo is deprecated and we will soon stop actively maintaining it, as a more up-to-date (and simple

CMU Locus Lab 221 Dec 26, 2022
Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Implementation of the Uniform DL Representation for AD algorithm describ

Paul Irofti 1 Nov 23, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
Custom implementation of Corrleation Module

Pytorch Correlation module this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC This tutorial was used as a basis for

Clément Pinard 361 Dec 12, 2022
Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022
A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

DaDa 106 Dec 29, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 07, 2023
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

Forex Data Prediction via Recurrent Neural Network Deep Sequence Modeling Research Paper Our research paper can be viewed here Installation Clone the

Alex Taradachuk 2 Aug 07, 2022
Unsupervised Learning of Video Representations using LSTMs

Unsupervised Learning of Video Representations using LSTMs Code for paper Unsupervised Learning of Video Representations using LSTMs by Nitish Srivast

Elman Mansimov 341 Dec 20, 2022