AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Last update: Jan 03, 2023

Related tags

Machine Learning oreilly_book

Overview

Data Science on AWS - O'Reilly Book

Get the book on Amazon.com

Book Outline

Quick Start Workshop (4-hours)

In this quick start hands-on workshop, you will build an end-to-end AI/ML pipeline for natural language processing with Amazon SageMaker. You will train and tune a text classifier to predict the star rating (1 is bad, 5 is good) for product reviews using the state-of-the-art BERT model for language representation. To build our BERT-based NLP text classifier, you will use a product reviews dataset where each record contains some review text and a star rating (1-5).

Quick Start Workshop Learning Objectives

Attendees will learn how to do the following:

Ingest data into S3 using Amazon Athena and the Parquet data format
Visualize data with pandas, matplotlib on SageMaker notebooks
Detect statistical data bias with SageMaker Clarify
Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
Store and share features using SageMaker Feature Store
Train and evaluate a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
Evaluate the model using SageMaker Processing Jobs
Track model artifacts using Amazon SageMaker ML Lineage Tracking
Run model bias and explainability analysis with SageMaker Clarify
Register and version models using SageMaker Model Registry
Deploy a model to a REST endpoint using SageMaker Hosting and SageMaker Endpoints
Automate ML workflow steps by building end-to-end model pipelines using SageMaker Pipelines

Extended Workshop (8-hours)

In the extended hands-on workshop, you will get hands-on with advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and auto-scaling. You will also setup a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection.

Extended Workshop Learning Objectives

Attendees will learn how to do the following:

Perform automated machine learning (AutoML) to find the best model from just your dataset with low-code
Find the best hyper-parameters for your custom model using SageMaker Hyper-parameter Tuning Jobs
Deploy multiple model variants into a live, production A/B test to compare online performance, live-shift prediction traffic, and autoscale the winning variant using SageMaker Hosting and SageMaker Endpoints
Setup a streaming analytics and continuous machine learning application using Amazon Kinesis and SageMaker

Workshop Instructions

Amazon SageMaker Studio Lab is a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

1. Request Amazon SageMaker Studio Lab Account

Go to Amazon SageMaker Studio Lab, and request a free acount by providing a valid email address.

Note that Amazon SageMaker Studio Lab is currently in public preview. The number of new account registrations will be limited to ensure a high quality of experience for all customers.

2. Create Studio Lab Account

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page.

You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn't require you to provide any billing information.

3. Sign in to your Studio Lab Account

You are now ready to sign in to your account.

4. Select your Compute instance, Start runtime, and Open project

CPU Option

Select CPU as the compute type and click Start runtime.

Once the Status shows Running, click Open project

5. Launch a New Terminal within Studio Lab

6. Clone this GitHub Repo in the Terminal

Within the Terminal, run the following:

cd ~ && git clone https://github.com/data-science-on-aws/oreilly_book

7. Create `data_science_on_aws` Conda kernel

Within the Terminal, run the following:

cd ~/oreilly_book/ && conda env create -f environment.yml || conda env update -f environment.yml && conda activate data_science_on_aws

If you see an error like the following, just ignore it. This will appear if you already have an existing Conda environment with this name. In this case, we will update the environment.

CondaValueError: prefix already exists: /home/studio-lab-user/.conda/envs/data_science_on_aws

8. Start the Workshop!

Navigate to oreilly_book/00_quickstart/ in SageMaker Studio Lab and start the workshop!

You may need to refresh your browser if you don't see the new oreilly_book/ directory.

When you open the notebooks, make sure to select the data_science_on_aws kernel.

AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Related tags

Overview

Data Science on AWS - O'Reilly Book

Get the book on Amazon.com

Book Outline

Quick Start Workshop (4-hours)

Quick Start Workshop Learning Objectives

Extended Workshop (8-hours)

Extended Workshop Learning Objectives

Workshop Instructions

1. Request Amazon SageMaker Studio Lab Account

2. Create Studio Lab Account

3. Sign in to your Studio Lab Account

4. Select your Compute instance, Start runtime, and Open project

CPU Option

5. Launch a New Terminal within Studio Lab

6. Clone this GitHub Repo in the Terminal

7. Create data_science_on_aws Conda kernel

8. Start the Workshop!

Owner

Data Science on AWS

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

A toolbox to iNNvestigate neural networks' predictions!

onelearn: Online learning in Python

Python package for stacking (machine learning technique)

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

An open-source library of algorithms to analyse time series in GPU and CPU.

Machine Learning e Data Science com Python

Python Automated Machine Learning library for tabular data.

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

A Python implementation of FastDTW

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

AutoOED: Automated Optimal Experiment Design Platform

Implementation of the Object Relation Transformer for Image Captioning

Generate music from midi files using BPE and markov model

Library for machine learning stacking generalization.

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

A Time Series Library for Apache Spark

A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is the material used in my free Persian course: Machine Learning with Python

Mortality risk prediction for COVID-19 patients using XGBoost models

7. Create `data_science_on_aws` Conda kernel