Machine Learning Systems Design

Read this booklet here.

This booklet covers four main steps of designing a machine learning system:

Project setup
Data pipeline
Modeling: selecting, training, and debugging
Serving: testing, deploying, and maintaining

It comes with links to practical resources that explain each aspect in more details. It also suggests case studies written by machine learning engineers at major tech companies who have deployed machine learning systems to solve real-world problems.

At the end, the booklet contains 27 open-ended machine learning systems design questions that might come up in machine learning interviews. The answers for these questions will be published in the book Machine Learning Interviews. You can look at and contribute to community answers to these questions on GitHub here. You can read more about the book and sign up for the book's mailing list here.

Contribute

This is work-in-progress so any type of contribution is very much appreciated. Here are a few ways you can contribute:

Improve the text by fixing any lexical, grammatical, or technical error
Add more relevant resources to each aspect of the machine learning project flow
Add/edit questions
Add/edit answers
Other

This book was created using the wonderful magicbook package. For detailed instructions on how to use the package, see their GitHub repo. The package requires that you have node. If you're on Mac, you can install node using:

brew install node

Install magicbook with:

npm install magicbook

Clone this repository:

git clone https://github.com/chiphuyen/machine-learning-systems-design.git
cd machine-learning-systems-design

After you've made changes to the content in the content folder, you can build the booklet by the following steps:

magicbook build

You'll find the generated HTML and PDF files in the folder build.

Acknowledgment

I'd like to thank Ben Krause for being a great friend and helping me with this draft!

A booklet on machine learning systems design with exercises

Related tags

Overview

Machine Learning Systems Design

Contribute

Acknowledgment

Citation

Owner

Chip Huyen

Static-test - A playground to play with ideas related to testing the comparability of the code

"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

An end-to-end project on customer segmentation

FS2KToolbox FS2K Dataset Towards the translation between Face

A python library for self-supervised learning on images.

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

Code accompanying paper: Meta-Learning to Improve Pre-Training

EZ graph is an easy to use AI solution that allows you to make and train your neural networks without a single line of code.

A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Semantic Image Synthesis with SPADE

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

Select, weight and analyze complex sample data

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

A Python implementation of active inference for Markov Decision Processes

U-Net implementation in PyTorch for FLAIR abnormality segmentation in brain MRI

Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

A containerized REST API around OpenAI's CLIP model.

Code for Learning to Segment The Tail (LST)

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)