Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Last update: Oct 18, 2022

Related tags

Machine Learning mlops

Overview

Federal University of Rio Grande do Norte

Technology Center

Department of Computer Engineering and Automation

Machine Learning Based Systems Design

References

📚 Noah Gift, Alfredo Deza. Practical MLOps: Operationalizing Machine Learning Models [Link]
📚 Chip Huyen. Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. [Link]
📚 Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines. [Link]
📚 Mariano Anaya. Clean Code in Python [Link]
📚 Aurélien Géron. Hands on Machine Learning with Scikit-Learn, Keras and TensorFlow. [Link]
🤜 Dataquest Academic Program [Link]
😃 CS329S - ML Systems Design [Link]
🎯 Machine Learning Operations [Link]

Lessons

Week 01: Course Outline

Git and Version Control
- You'll learn how to: a) organize your code using version control, b) resolve conflicts in version control, c) employ Git and Github to collaborate with others.
- 👊 U1T1: guided project + getting a git repository.

Week 02: CLI fundamentals

Elements of the Command Line
- You'll learn how to: a) employ the command line for Data Science, b) modify the behavior of commands with options, c) employ glob patterns and wildcards, d) define Important command line concepts, e) navigate he filesystem, f) manage users and permissions.
Text Processing in the Command Line
- You'll learn how to: a) read and explore documentation, b) perform basic text processing, c) redirect and pipe output, d) inspect files, e) define different kinds of output, f) employ streams and file descriptors.
🔠 U1T2: working with command line.

Week 03 - Clean Code Principles for Data Science and Machine Learning

Outline
Coding Best Practices
Writing Clean Code
Refactoring Code
Efficient Code
Documentation
Python Code Quality Authority (PCQA) - pycodestyle
PCQA - pylint
PCQA - autopep8
PCQA - nbQA
▶️ Hands on
- 💾 Datasets [Link]
- Writting Clean Code
- Exercise 01
- Exercise 02
- Exercise 03
- Using pycodestyle
- Using pylint - script refactored script
- Functions: Advanced - Best practices for writing functions

Week 04 Production Ready Code

Outline
Catching Errors
Testing and Data Science
A brief introduction about pytest
Logging
Case study: testing and logging
Model Drift
Hands on
- Production ready code
- Data Visualization Fundamentals
  - You will learn how to: a) how to use data visualization to explore data and b) how and when to use the most common plots.
- Storytelling Data Visualization and Information Design
  - You will learn how to: a) Create graphs using information design principles, b) create narrative data visualizations using Matplotlib, c) create visual patterns using Gestalt principles, d) control attention using pre-attentive attributes and e) employ Matplotlib's built-in styles.

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Related tags

Overview

Federal University of Rio Grande do Norte

Technology Center

Department of Computer Engineering and Automation

Machine Learning Based Systems Design

References

Lessons

Owner

Ivanovitch Silva

Simple and flexible ML workflow engine.

Automatically create Faiss knn indices with the most optimal similarity search parameters.

ETNA – time series forecasting framework

A Python implementation of the Robotics Toolbox for MATLAB

A Python library for choreographing your machine learning research.

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

A toolkit for geo ML data processing and model evaluation (fork of solaris)

Scikit learn library models to account for data and concept drift.

Distributed Computing for AI Made Simple

monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

YouTube Spam Detection with python

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

Responsible AI Workshop: a series of tutorials & walkthroughs to illustrate how put responsible AI into practice

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

This is a Machine Learning model which predicts the presence of Diabetes in Patients

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster