Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

Bearsql allows you to query pandas dataframe with sql syntax.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

peptides.py is a pure-Python package to compute common descriptors for protein sequences

A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

Stochastic Gradient Trees implementation in Python

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Hydrogen (or other pure gas phase species) depressurization calculations

PyPSA: Python for Power System Analysis

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

Programmatically access the physical and chemical properties of elements in modern periodic table.

High Dimensional Portfolio Selection with Cardinality Constraints

Provide a market analysis (R)

Bamboolib - a GUI for pandas DataFrames

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

A distributed block-based data storage and compute engine

ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

A pipeline that creates consensus sequences from a Nanopore reads. I

Hidden Markov Models in Python, with scikit-learn like API