This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.

Overview

Deploying ML models with FastAPI, Docker, and Kubernetes

By: Sayak Paul and Chansung Park

This project shows how to serve an ONNX-optimized image classification model as a RESTful web service with FastAPI, Docker, and Kubernetes (k8s). The idea is to first Dockerize the API and then deploy it on a k8s cluster running on Google Kubernetes Engine (GKE). We do this integration using GitHub Actions.

👋 Note: Even though this project uses an image classification its structure and techniques can be used to serve other models as well.

Deploying the model as a service with k8s

  • We decouple the model optimization part from our API code. The optimization part is available within the notebooks/TF_to_ONNX.ipynb notebook.

  • Then we locally test the API. You can find the instructions within the api directory.

  • To deploy the API, we define our deployment.yaml workflow file inside .github/workflows. It does the following tasks:

    • Looks for any changes in the specified directory. If there are any changes:
    • Builds and pushes the latest Docker image to Google Container Register (GCR).
    • Deploys the Docker container on the k8s cluster running on GKE.

Configurations needed beforehand

  • Create a k8s cluster on GKE. Here's a relevant resource.

  • Create a service account key (JSON) file. It's a good practice to only grant it the roles required for the project. For example, for this project, we created a fresh service account and granted it permissions for the following: Storage Admin, GKE Developer, and GCR Developer.

  • Crete a secret named GCP_CREDENTIALS on your GitHub repository and copy paste the contents of the service account key file into the secret.

  • Configure bucket storage related permissions for the service account:

    $ export PROJECT_ID=<PROJECT_ID>
    $ export ACCOUNT=<ACCOUNT>
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.admin
    
    $ gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectAdmin
    
    gcloud -q projects add-iam-policy-binding ${PROJECT_ID} \
        --member=serviceAccount:${ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \
        --role roles/storage.objectCreator
  • If you're on the main branch already then upon a new push, the worflow defined in .github/workflows/deployment.yaml should automatically run. Here's how the final outputs should look like so (run link):

Notes

  • Since we use CPU-based pods within the k8s cluster, we use ONNX optimizations since they are known to provide performance speed-ups for CPU-based environments. If you are using GPU-based pods then look into TensorRT.
  • We use Kustomize to manage the deployment on k8s.

Querying the API endpoint

From workflow outputs, you should see something like so:

NAME             TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
fastapi-server   LoadBalancer   xxxxxxxxxx   xxxxxxxxxx        80:30768/TCP   23m
kubernetes       ClusterIP      xxxxxxxxxx     <none>          443/TCP        160m

Note the EXTERNAL-IP corresponding to fastapi-server (iff you have named your service like so). Then cURL it:

curl -X POST -F [email protected] -F with_resize=True -F with_post_process=True http://{EXTERNAL-IP}:80/predict/image

You should get the following output (if you're using the cat.jpg image present in the api directory):

"{\"Label\": \"tabby\", \"Score\": \"0.538\"}"

The request assumes that you have a file called cat.jpg present in your working directory.

TODO (s)

  • Set up logging for the k8s pods.
  • Find a better way to report the latest API endpoint.

Acknowledgements

ML-GDE program for providing GCP credit support.

Comments
  • Feat/locust grpc

    Feat/locust grpc

    @deep-diver currently, the load test runs into:

    Screenshot 2022-04-02 at 10 54 26 AM

    I have ensured https://github.com/sayakpaul/ml-deployment-k8s-fastapi/blob/feat/locust-grpc/locust/grpc/locustfile.py#L49 returns the correct output. But after a few requests, I run into the above problem.

    Also, I should mention that the gRPC client currently does not take care of image resizing which makes it a bit less comparable to the REST client which handles preprocessing as well postprocessing.

    opened by sayakpaul 18
  • Setup TF Serving based deployment

    Setup TF Serving based deployment

    In this new feature, the following works are expected

    • Update the notebook Create a new notebook with the TF Serving prototype based on both gRPC(Ref) and RestAPI(Ref).

    • Update the notebook Update the newly created notebook to check the %%timeit on the TF Serving server locally.

    • Build/Commit docker image based on TF Serving base image using this method.

    • Deploy the built docker image on GKE cluster

    • Check the deployed model's performance with a various scenarios (maybe the same ones applied to ONNX+FastAPI scenarios)

    new feature 
    opened by deep-diver 11
  • Perform load testing with Locust

    Perform load testing with Locust

    Resources:

    • https://towardsdatascience.com/performance-testing-an-ml-serving-api-with-locust-ecd98ab9b7f7
    • https://microsoft.github.io/PartsUnlimitedMRP/pandp/200.1x-PandP-LocustTest.html
    • https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course4/week2-ungraded-labs/C4_W2_Lab_3_Latency_Test_Compose
    opened by sayakpaul 10
  • 4 dockerize

    4 dockerize

    fix

    • move api/utils/requirements.txt to /api
    • add missing dependency python-multipart to the requirements.txt

    add

    • Dockerfile

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/4

    opened by deep-diver 4
  • Deployment on GKE with GitHub Actions

    Deployment on GKE with GitHub Actions

    Closes https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/5, https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/7, and https://github.com/sayakpaul/ml-deployment-k8s-fastapi/issues/6.

    opened by sayakpaul 2
  • chore: refactored the colab notebook.

    chore: refactored the colab notebook.

    Just added a text cell explaining why it's better to include the preprocessing function in the final exported model. Also, added a cell to show if the TF and ONNX outputs match with np.testing.assert_allclose().

    opened by sayakpaul 2
Owner
Sayak Paul
ML Engineer at @carted | One PR at a time
Sayak Paul
Signalling for FastAPI.

fastapi-signals Signalling for FastAPI.

Henshal B 7 May 04, 2022
API Simples com python utilizando a biblioteca FastApi

api-fastapi-python API Simples com python utilizando a biblioteca FastApi Para rodar esse script são necessárias duas bibliotecas: Fastapi: Comando de

Leonardo Grava 0 Apr 29, 2022
Flask-vs-FastAPI - Understanding Flask vs FastAPI Web Framework. A comparison of two different RestAPI frameworks.

Flask-vs-FastAPI Understanding Flask vs FastAPI Web Framework. A comparison of two different RestAPI frameworks. IntroductionIn Flask is a popular mic

Mithlesh Navlakhe 1 Jan 01, 2022
Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automatically use request headers such as x-request-id or x-correlation-id.

starlette context Middleware for Starlette that allows you to store and access the context data of a request. Can be used with logging so logs automat

Tomasz Wójcik 300 Dec 26, 2022
This repository contains learning resources for Python Fast API Framework and Docker

This repository contains learning resources for Python Fast API Framework and Docker, Build High Performing Apps With Python BootCamp by Lux Academy and Data Science East Africa.

Harun Mbaabu Mwenda 23 Nov 20, 2022
A FastAPI WebSocket application that makes use of ncellapp package by @hemantapkh

ncellFastAPI author: @awebisam Used FastAPI to create WS application. Ncellapp module by @hemantapkh NOTE: Not following best practices and, needs ref

Aashish Bhandari 7 Oct 01, 2021
Boilerplate code for quick docker implementation of REST API with JWT Authentication using FastAPI, PostgreSQL and PgAdmin ⭐

FRDP Boilerplate code for quick docker implementation of REST API with JWT Authentication using FastAPI, PostgreSQL and PgAdmin ⛏ . Getting Started Fe

BnademOverflow 53 Dec 29, 2022
Light, Flexible and Extensible ASGI API framework

Starlite Starlite is a light and flexible ASGI API framework. Using Starlette and pydantic as foundations. Check out the Starlite documentation 📚 Cor

1.5k Jan 04, 2023
Code for my FastAPI tutorial

FastAPI tutorial Code for my video tutorial FastAPI tutorial What is FastAPI? FastAPI is a high-performant REST API framework for Python. It's built o

José Haro Peralta 9 Nov 15, 2022
A Sample App to Demonstrate React Native and FastAPI Integration

React Native - Service Integration with FastAPI Backend. A Sample App to Demonstrate React Native and FastAPI Integration UI Based on NativeBase toolk

YongKi Kim 4 Nov 17, 2022
A utility that allows you to use DI in fastapi without Depends()

fastapi-better-di What is this ? fastapi-better-di is a utility that allows you to use DI in fastapi without Depends() Installation pip install fastap

Maxim 9 May 24, 2022
fastapi-cache is a tool to cache fastapi response and function result, with backends support redis and memcached.

fastapi-cache Introduction fastapi-cache is a tool to cache fastapi response and function result, with backends support redis, memcache, and dynamodb.

long2ice 551 Jan 08, 2023
Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡

csrf-starlette-fastapi Dead simple CSRF security middleware for Starlette ⭐ and Fast API ⚡ Will work with either a input type="hidden" field or ajax

Nathaniel Sabanski 9 Nov 20, 2022
Fastapi performans monitoring

Fastapi-performans-monitoring This project is a simple performance monitoring for FastAPI. License This project is licensed under the terms of the MIT

bilal alpaslan 11 Dec 31, 2022
Cache-house - Caching tool for python, working with Redis single instance and Redis cluster mode

Caching tool for python, working with Redis single instance and Redis cluster mo

Tural 14 Jan 06, 2022
FastAPI-Amis-Admin is a high-performance, efficient and easily extensible FastAPI admin framework. Inspired by django-admin, and has as many powerful functions as django-admin.

简体中文 | English 项目介绍 FastAPI-Amis-Admin fastapi-amis-admin是一个拥有高性能,高效率,易拓展的fastapi管理后台框架. 启发自Django-Admin,并且拥有不逊色于Django-Admin的强大功能. 源码 · 在线演示 · 文档 · 文

AmisAdmin 318 Dec 31, 2022
FastAPI framework plugins

Plugins for FastAPI framework, high performance, easy to learn, fast to code, ready for production fastapi-plugins FastAPI framework plugins Cache Mem

RES 239 Dec 28, 2022
Keycloack plugin for FastApi.

FastAPI Keycloack Keycloack plugin for FastApi. Your aplication receives the claims decoded from the access token. Usage Run keycloak on port 8080 and

Elber 4 Jun 24, 2022
fastapi-mqtt is extension for MQTT protocol

fastapi-mqtt MQTT is a lightweight publish/subscribe messaging protocol designed for M2M (machine to machine) telemetry in low bandwidth environments.

Sabuhi 144 Dec 28, 2022
Opentracing support for Starlette and FastApi

Starlette-OpenTracing OpenTracing support for Starlette and FastApi. Inspired by: Flask-OpenTracing OpenTracing implementations exist for major distri

Rene Dohmen 63 Dec 30, 2022