🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects

Related tags

Testingml-testing
Overview

Effective Testing for Machine Learning Projects

CI

Code for PyData Global 2021 Presentation by @edublancas. Slides available here.

The project is developed using Ploomber; check it out! :)

If you have questions, ping me on Slack.

Blog post series

  1. Part I
  2. Part II
  3. Part III

Follow @ploomber on Twitter, or subscribe to our newsletter for more amazing content!

Organization

The talk describes five stages of testing, from the most basic one to the most robust. The idea is to make progress and add more robust tests continuously. You can navigate through the branches of this repository to see how each time, it becomes more robust as we add more tests and modularize the code. Here are the links for each level:

  1. Smoke testing (1-smoke-testing)
  2. Integration and unit testing (2-integration-and-unit)
  3. Variable distributions and inference pipeline (3-distribution-and-inference)
  4. Training-serving skew (4-train-serve-skew)
  5. Model quality (5-model-quality)

Tests are run automatically on each push using GitHub Actions; you can see the configuration file at .github/workflows/ci.yml

Setup

# get the code
git clone https://github.com/edublancas/ml-testing

# move to one of the branches
git checkout branch-name

# example
git checkout 1-smoke-testing

# install dependencies
# conda
conda env create -f environment.yml
# pip
pip install -r requirements.txt

# build the pipeline
ploomber build

# run unit tests (added on level 2)
pytest

Resources

Owner
Eduardo Blancas
Bridging the gap between interactive data work and production.
Eduardo Blancas
A Proof of concept of a modern python CLI with click, pydantic, rich and anyio

httpcli This project is a proof of concept of a modern python networking cli which can be simple and easy to maintain using some of the best packages

Kevin Tewouda 17 Nov 15, 2022
buX Course Enrollment Automation

buX automation BRACU - buX course enrollment automation Features: Automatically enroll into multiple courses at a time. Find courses just entering cou

Mohammad Shakib 1 Oct 06, 2022
Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away

Multi-asset backtesting framework. An intuitive API lets analysts try out their strategies right away. Fast execution of profit-take/loss-cut orders is built-in. Seamless with Pandas.

Epymetheus 39 Jan 06, 2023
Travel through time in your tests.

time-machine Travel through time in your tests. A quick example: import datetime as dt

Adam Johnson 373 Dec 27, 2022
Statistical tests for the sequential locality of graphs

Statistical tests for the sequential locality of graphs You can assess the statistical significance of the sequential locality of an adjacency matrix

2 Nov 23, 2021
Spam the buzzer and upgrade automatically - Selenium

CookieClicker Usage: Let's check your chrome navigator version : Consequently, you have to : download the right chromedriver in the follow link : http

Iliam Amara 1 Nov 22, 2021
Using openpyxl in Python, performed following task

Python-Automation-with-openpyxl Using openpyxl in Python, performed following tasks on an Excel Sheet containing Product Suppliers along with their pr

1 Apr 06, 2022
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Leon 3.5k Dec 30, 2022
A grab-bag of nifty pytest plugins

A goody-bag of nifty plugins for pytest OS Build Coverage Plugin Description Supported OS pytest-server-fixtures Extensible server-running framework w

Man Group 492 Jan 03, 2023
A simple Python script I wrote that scrapes NASA's James Webb Space Telescope tracker website using Selenium and returns its current status and location.

A simple Python script I wrote that scrapes NASA's James Webb Space Telescope tracker website using Selenium and returns its current status and location.

9 Feb 10, 2022
Test django schema and data migrations, including migrations' order and best practices.

django-test-migrations Features Allows to test django schema and data migrations Allows to test both forward and rollback migrations Allows to test th

wemake.services 382 Dec 27, 2022
Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

Pynguin, the PYthoN General UnIt test geNerator, is a tool that allows developers to generate unit tests automatically.

Chair of Software Engineering II, Uni Passau 997 Jan 06, 2023
Parameterized testing with any Python test framework

Parameterized testing with any Python test framework Parameterized testing in Python sucks. parameterized fixes that. For everything. Parameterized te

David Wolever 714 Dec 21, 2022
a wrapper around pytest for executing tests to look for test flakiness and runtime regression

bubblewrap a wrapper around pytest for assessing flakiness and runtime regressions a cs implementations practice project How to Run: First, install de

Anna Nagy 1 Aug 05, 2021
This project is used to send a screenshot by email of your MyUMons schedule using Selenium python lib (headless mode)

MyUMonsSchedule Use MyUMonsSchedule python script to send a screenshot by email (Gmail) of your MyUMons schedule. If you use it on Windows, take care

Pierre-Louis D'Agostino 6 May 12, 2022
Network automation lab using nornir, scrapli, and containerlab with Arista EOS

nornir-scrapli-eos-lab Network automation lab using nornir, scrapli, and containerlab with Arista EOS. Objectives Deploy base configs to 4xArista devi

Vireak Ouk 13 Jul 07, 2022
PacketPy is an open-source solution for stress testing network devices using different testing methods

PacketPy About PacketPy is an open-source solution for stress testing network devices using different testing methods. Currently, there are only two c

4 Sep 22, 2022
A pytest plugin, that enables you to test your code that relies on a running PostgreSQL Database

This is a pytest plugin, that enables you to test your code that relies on a running PostgreSQL Database. It allows you to specify fixtures for PostgreSQL process and client.

Clearcode 252 Dec 21, 2022
Python scripts for a generic performance testing infrastructure using Locust.

TODOs Reference to published paper or online version of it loadtest_plotter.py: Cleanup and reading data from files ARS_simulation.py: Cleanup, docume

Juri Tomak 3 Dec 15, 2022
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023