🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects

Related tags

Testingml-testing
Overview

Effective Testing for Machine Learning Projects

CI

Code for PyData Global 2021 Presentation by @edublancas. Slides available here.

The project is developed using Ploomber; check it out! :)

If you have questions, ping me on Slack.

Blog post series

  1. Part I
  2. Part II
  3. Part III

Follow @ploomber on Twitter, or subscribe to our newsletter for more amazing content!

Organization

The talk describes five stages of testing, from the most basic one to the most robust. The idea is to make progress and add more robust tests continuously. You can navigate through the branches of this repository to see how each time, it becomes more robust as we add more tests and modularize the code. Here are the links for each level:

  1. Smoke testing (1-smoke-testing)
  2. Integration and unit testing (2-integration-and-unit)
  3. Variable distributions and inference pipeline (3-distribution-and-inference)
  4. Training-serving skew (4-train-serve-skew)
  5. Model quality (5-model-quality)

Tests are run automatically on each push using GitHub Actions; you can see the configuration file at .github/workflows/ci.yml

Setup

# get the code
git clone https://github.com/edublancas/ml-testing

# move to one of the branches
git checkout branch-name

# example
git checkout 1-smoke-testing

# install dependencies
# conda
conda env create -f environment.yml
# pip
pip install -r requirements.txt

# build the pipeline
ploomber build

# run unit tests (added on level 2)
pytest

Resources

Owner
Eduardo Blancas
Bridging the gap between interactive data work and production.
Eduardo Blancas
Automates hiketop+ crystal earning using python and appium

hikepy Works on poco x3 idk about your device deponds on resolution Prerquests Android sdk java adb Setup Go to https://appium.io/ Download and instal

4 Aug 26, 2022
It's a simple script to generate a mush on code forces, the script will accept the public problem urls only or polygon problems.

Codeforces-Sheet-Generator It's a simple script to generate a mushup on code forces, the script will accept the public problem urls only or polygon pr

Ahmed Hossam 10 Aug 02, 2022
This is a bot that can type without any assistance and have incredible speed.

BulldozerType This is a bot that can type without any assistance and have incredible speed. This bot currently only works on the site https://onlinety

1 Jan 03, 2022
PENBUD is penetration testing buddy which helps you in penetration testing by making various important tools interactive.

penbud - Penetration Tester Buddy PENBUD is penetration testing buddy which helps you in penetration testing by making various important tools interac

Himanshu Shukla 15 Feb 01, 2022
WEB PENETRATION TESTING TOOL 💥

N-WEB ADVANCE WEB PENETRATION TESTING TOOL Features 🎭 Admin Panel Finder Admin Scanner Dork Generator Advance Dork Finder Extract Links No Redirect H

56 Dec 23, 2022
splinter - python test framework for web applications

splinter - python tool for testing web applications splinter is an open source tool for testing web applications using Python. It lets you automate br

Cobra Team 2.6k Dec 27, 2022
A toolbar overlay for debugging Flask applications

Flask Debug-toolbar This is a port of the excellent django-debug-toolbar for Flask applications. Installation Installing is simple with pip: $ pip ins

863 Dec 29, 2022
Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source.

Mockoon Mockoon is the easiest and quickest way to run mock APIs locally. No remote deployment, no account required, open source. It has been built wi

mockoon 4.4k Dec 30, 2022
Photostudio是一款能进行自动化检测网页存活并实时给网页拍照的工具,通过调用Fofa/Zoomeye/360qua/shodan等 Api快速准确查询资产并进行网页截图,从而实施进一步的信息筛查。

Photostudio-红队快速爬取网页快照工具 一、简介: 正如其名:这是一款能进行自动化检测,实时给网页拍照的工具 信息收集要求所收集到的信息要真实可靠。 当然,这个原则是信息收集工作的最基本的要求。为达到这样的要求,信息收集者就必须对收集到的信息反复核实,不断检验,力求把误差减少到最低限度。我

s7ck Team 41 Dec 11, 2022
🏃💨 For when you need to fill out feedback in the last minute.

BMSCE Auto Feedback For when you need to fill out feedback in the last minute. 🏃 💨 Setup Clone the repository Run pip install selenium Set the RATIN

Shaan Subbaiah 10 May 23, 2022
A simple asynchronous TCP/IP Connect Port Scanner in Python 3

Python 3 Asynchronous TCP/IP Connect Port Scanner A simple pure-Python TCP Connect port scanner. This application leverages the use of Python's Standa

70 Jan 03, 2023
Python selenium script to bypass simaster.ugm.ac.id weak captcha.

Python selenium script to bypass simaster.ugm.ac.id weak "captcha".

Hafidh R K 1 Feb 01, 2022
Python Moonlight (Machine Learning) Practice

PyML Python Moonlight (Machine Learning) Practice Contents Design Documentation Prerequisites Checklist Dev Setup Testing Run Prerequisites Python 3 P

Dockerian Seattle 2 Dec 25, 2022
User-interest mock backend server implemnted using flask restful, and SQLAlchemy ORM confiugred with sqlite

Flask_Restful_SQLAlchemy_server User-interest mock backend server implemnted using flask restful, and SQLAlchemy ORM confiugred with sqlite. Backend b

Austin Weigel 1 Nov 17, 2022
Python scripts for a generic performance testing infrastructure using Locust.

TODOs Reference to published paper or online version of it loadtest_plotter.py: Cleanup and reading data from files ARS_simulation.py: Cleanup, docume

Juri Tomak 3 Dec 15, 2022
Load and performance benchmark tool

Yandex Tank Yandextank has been moved to Python 3. Latest stable release for Python 2 here. Yandex.Tank is an extensible open source load testing tool

Yandex 2.2k Jan 03, 2023
Doing dirty (but extremely useful) things with equals.

Doing dirty (but extremely useful) things with equals. Documentation: dirty-equals.helpmanual.io Source Code: github.com/samuelcolvin/dirty-equals dir

Samuel Colvin 602 Jan 05, 2023
BDD library for the py.test runner

BDD library for the py.test runner pytest-bdd implements a subset of the Gherkin language to enable automating project requirements testing and to fac

pytest-dev 1.1k Jan 09, 2023
Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics.

LoLalytics-scraper Um scraper feito em python que gera arquivos de excel baseados nas tier lists do site LoLalytics. Começando por um único script com

Kevin Souza 1 Feb 19, 2022
A modern API testing tool for web applications built with Open API and GraphQL specifications.

Schemathesis Schemathesis is a modern API testing tool for web applications built with Open API and GraphQL specifications. It reads the application s

Schemathesis.io 1.6k Jan 06, 2023