📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Last update: Jan 03, 2023

Overview

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Papermill lets you:

parameterize notebooks
execute notebooks

This opens up new opportunities for how notebooks can be used. For example:

Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.

Papermill takes an opinionated approach to notebook parameterization and execution based on our experiences using notebooks at scale in data pipelines.

Installation

From the command line:

pip install papermill

For all optional io dependencies, you can specify individual bundles like s3, or azure -- or use all. To use Black to format parameters you can add as an extra requires ['black'].

pip install papermill[all]

Python Version Support

This library currently supports Python 3.6+ versions. As minor Python versions are officially sunset by the Python org papermill will similarly drop support in the future.

Usage

Parameterizing a Notebook

To parameterize your notebook designate a cell with the tag parameters.

Papermill looks for the parameters cell and treats this cell as defaults for the parameters passed in at execution time. Papermill will add a new cell tagged with injected-parameters with input parameters in order to overwrite the values in parameters. If no cell is tagged with parameters the injected cell will be inserted at the top of the notebook.

Additionally, if you rerun notebooks through papermill and it will reuse the injected-parameters cell from the prior run. In this case Papermill will replace the old injected-parameters cell with the new run's inputs.

Executing a Notebook

The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface.

Execute via the Python API

import papermill as pm

pm.execute_notebook(
   'path/to/input.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Execute via CLI

Here's an example of a local notebook being executed and output to an Amazon S3 account:

$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

NOTE: If you use multiple AWS accounts, and you have properly configured your AWS credentials, then you can specify which account to use by setting the AWS_PROFILE environment variable at the command-line. For example:

$ AWS_PROFILE=dev_account papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

In the above example, two parameters are set: alpha and l1_ratio using -p (--parameters also works). Parameter values that look like booleans or numbers will be interpreted as such. Here are the different ways users may set parameters:

$ papermill local/input.ipynb s3://bkt/output.ipynb -r version 1.0

Using -r or --parameters_raw, users can set parameters one by one. However, unlike -p, the parameter will remain a string, even if it may be interpreted as a number or boolean.

$ papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml

Using -f or --parameters_file, users can provide a YAML file from which parameter values should be read.

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
alpha: 0.6
l1_ratio: 0.1"

Using -y or --parameters_yaml, users can directly provide a YAML string containing parameter values.

$ papermill local/input.ipynb s3://bkt/output.ipynb -b YWxwaGE6IDAuNgpsMV9yYXRpbzogMC4xCg==

Using -b or --parameters_base64, users can provide a YAML string, base64-encoded, containing parameter values.

When using YAML to pass arguments, through -y, -b or -f, parameter values can be arrays or dictionaries:

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
x:
    - 0.0
    - 1.0
    - 2.0
    - 3.0
linear_function:
    slope: 3.0
    intercept: 1.0"

Supported Name Handlers

Papermill supports the following name handlers for input and output paths during execution:

Local file system: local
HTTP, HTTPS protocol: http://, https://
Amazon Web Services: AWS S3 s3://
Azure: Azure DataLake Store, Azure Blob Store adl://, abs://
Google Cloud: Google Cloud Storage gs://

Development Guide

Read CONTRIBUTING.md for guidelines on how to setup a local development environment and make code changes back to Papermill.

For development guidelines look in the DEVELOPMENT_GUIDE.md file. This should inform you on how to make particular additions to the code base.

Documentation

We host the Papermill documentation on ReadTheDocs.

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Related tags

Overview

Installation

Python Version Support

Usage

Parameterizing a Notebook

Executing a Notebook

Execute via the Python API

Execute via CLI

Supported Name Handlers

Development Guide

Documentation

Owner

nteract

Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Label Mask for Multi-label Classification

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

Human-Pose-and-Motion History

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

for taichi voxel-challange event

A deep learning framework for historical document image analysis

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Learning Visual Words for Weakly-Supervised Semantic Segmentation

Algorithm to texture 3D reconstructions from multi-view stereo images

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Related tags

Overview

Installation

Python Version Support

Usage

Parameterizing a Notebook

Executing a Notebook

Execute via the Python API

Execute via CLI

Supported Name Handlers

Development Guide

Documentation

Owner

nteract

Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Label Mask for Multi-label Classification

Where-Got-Time - An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

Human-Pose-and-Motion History

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

for taichi voxel-challange event

A deep learning framework for historical document image analysis

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Learning Visual Words for Weakly-Supervised Semantic Segmentation

Algorithm to texture 3D reconstructions from multi-view stereo images

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务