Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Overview

Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker

Earlier this year we announced a strategic collaboration with Amazon to make it easier for companies to use Hugging Face Transformers in Amazon SageMaker, and ship cutting-edge Machine Learning features faster. We introduced new Hugging Face Deep Learning Containers (DLCs) to train and deploy Hugging Face Transformers in Amazon SageMaker.

In addition to the Hugging Face Inference DLCs, we created a Hugging Face Inference Toolkit for SageMaker. This Inference Toolkit leverages the pipelines from the transformers library to allow zero-code deployments of models, without requiring any code for pre-or post-processing.

In October and November, we held a workshop series on “Enterprise-Scale NLP with Hugging Face & Amazon SageMaker”. This workshop series consisted out of 3 parts and covers:

  • Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it
  • Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker
  • MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

We recorded all of them so you are now able to do the whole workshop series on your own to enhance your Hugging Face Transformers skills with Amazon SageMaker or vice-versa.

Below you can find all the details of each workshop and how to get started.

🧑🏻‍💻 Github Repository: https://github.com/philschmid/huggingface-sagemaker-workshop-series

📺   Youtube Playlist: https://www.youtube.com/playlist?list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ

Note: The Repository contains instructions on how to access a temporary AWS, which was available during the workshops. To be able to do the workshop now you need to use your own or your company AWS Account.

In Addition to the workshop we created a fully dedicated Documentation for Hugging Face and Amazon SageMaker, which includes all the necessary information. If the workshop is not enough for you we also have 15 additional getting samples Notebook Github repository, which cover topics like distributed training or leveraging Spot Instances.

Workshop 1: Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it

In Workshop 1 you will learn how to use Amazon SageMaker to train a Hugging Face Transformer model and deploy it afterwards.

  • Prepare and upload a test dataset to S3
  • Prepare a fine-tuning script to be used with Amazon SageMaker Training jobs
  • Launch a training job and store the trained model into S3
  • Deploy the model after successful training

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_1_getting_started_with_amazon_sagemaker

📺  Youtube: https://www.youtube.com/watch?v=pYqjCzoyWyo&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=5s&ab_channel=HuggingFace

Workshop 2: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker

In Workshop 2 learn how to use Amazon SageMaker to deploy, scale & monitor your Hugging Face Transformer models for production workloads.

  • Run Batch Prediction on JSON files using a Batch Transform
  • Deploy a model from hf.co/models to Amazon SageMaker and run predictions
  • Configure autoscaling for the deployed model
  • Monitor the model to see avg. request time and set up alarms

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_2_going_production

📺  Youtube: https://www.youtube.com/watch?v=whwlIEITXoY&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=61s

Workshop 3: MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

In Workshop 3 learn how to build an End-to-End MLOps Pipeline for Hugging Face Transformers from training to production using Amazon SageMaker.

We are going to create an automated SageMaker Pipeline which:

  • processes a dataset and uploads it to s3
  • fine-tunes a Hugging Face Transformer model with the processed dataset
  • evaluates the model against an evaluation set
  • deploys the model if it performed better than a certain threshold

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_3_mlops

📺  Youtube: https://www.youtube.com/watch?v=XGyt8gGwbY0&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=7

Access Workshop AWS Account

For this workshop you’ll get access to a temporary AWS Account already pre-configured with Amazon SageMaker Notebook Instances. Follow the steps in this section to login to your AWS Account and download the workshop material.

1. To get started navigate to - https://dashboard.eventengine.run/login

setup1

Click on Accept Terms & Login

2. Click on Email One-Time OTP (Allow for up to 2 mins to receive the passcode)

setup2

3. Provide your email address

setup3

4. Enter your OTP code

setup4

5. Click on AWS Console

setup5

6. Click on Open AWS Console

setup6

7. In the AWS Console click on Amazon SageMaker

setup7

8. Click on Notebook and then on Notebook instances

setup8

9. Create a new Notebook instance

setup9

10. Configure Notebook instances

  • Make sure to increase the Volume Size of the Notebook if you want to work with big models and datasets
  • Add your IAM_Role with permissions to run your SageMaker Training And Inference Jobs
  • Add the Workshop Github Repository to the Notebook to preload the notebooks: https://github.com/philschmid/huggingface-sagemaker-workshop-series.git

setup10

11. Open the Lab and select the right kernel you want to do and have fun!

Open the workshop you want to do (workshop_1_getting_started_with_amazon_sagemaker/) and select the pytorch kernel

setup11

Owner
Philipp Schmid
Machine Learning Engineer & Tech Lead at Hugging Face👨🏻‍💻 🤗 Cloud enthusiast ☁️ AWS ML HERO 🦸🏻‍♂️ Nuremberg 🇩🇪
Philipp Schmid
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
An ActivityWatch watcher to pose questions to the user and record her answers.

aw-watcher-ask An ActivityWatch watcher to pose questions to the user and record her answers. This watcher uses Zenity to present dialog boxes to the

Bernardo Chrispim Baron 33 Dec 03, 2022
jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

jel: Japanese Entity Linker jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese. Usage Currently, link and question methods

izuna385 10 Jan 06, 2023
NLP: SLU tagging

NLP: SLU tagging

北海若 3 Jan 14, 2022
STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

st3 STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch. Currently it supports converting pbmm models to pt scripts with integra

Vlad Ki 8 Oct 18, 2021
Backend for the Autocomplete platform. An AI assisted coding platform.

Introduction A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit yo

Tatenda Christopher Chinyamakobvu 1 Jan 31, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Semantic search through Wikipedia with the Weaviate vector search engine Weaviate is an open source vector search engine with build-in vectorization a

SeMI Technologies 191 Dec 26, 2022
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Weaviate demo with the text2vec-openai module

Weaviate demo with the text2vec-openai module This repository contains an example of how to use the Weaviate text2vec-openai module. When using this d

SeMI Technologies 11 Nov 11, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

jon_chun 14 Dec 19, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
Text Normalization(文本正则化)

Text Normalization(文本正则化) 任务描述:通过机器学习算法将英文文本的“手写”形式转换成“口语“形式,例如“6ft”转换成“six feet”等 实验结果 XGBoost + bag-of-words: 0.99159 XGBoost+Weights+rules:0.99002

Jason_Zhang 0 Feb 26, 2022
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

Machine Translation @ UPC 21 Dec 20, 2022
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

EleutherAI 3.1k Jan 08, 2023