A workshop with several modules to help learn Feast, an open-source feature store

Last update: Jan 05, 2023

Related tags

Text Data & NLP feast-workshop

Overview

Workshop: Learning Feast

This workshop aims to teach users about Feast, an open-source feature store.

We explain concepts & best practices by example, and also showcase how to address common use cases.

What is Feast?

Feast is an operational system for managing and serving machine learning features to models in production. It can serve features from a low-latency online store (for real-time prediction) or from an offline store (for batch scoring).

Why Feast?

Feast solves several common challenges teams face:

Lack of feature reuse across teams
Complex point-in-time-correct data joins for generating training data
Difficulty operationalizing features for online inference while minimizing training / serving skew

Pre-requisites

This workshop assumes you have the following installed:

A local development environment that supports running Jupyter notebooks (e.g. VSCode with Jupyter plugin)
Python 3.7+
Java 11 (for Spark, e.g. brew install java11)
pip
Docker & Docker Compose (e.g. brew install docker docker-compose)
Terraform (docs)
AWS CLI
An AWS account setup with credentials via aws configure (e.g see AWS credentials quickstart)

Since we'll be learning how to leverage Feast in CI/CD, you'll also need to fork this workshop repository.

Caveats

M1 Macbook development is untested with this flow. See also How to run / develop for Feast on M1 Macs.
Windows development has only been tested with WSL. You will need to follow this guide to have Docker play nicely.

Modules

These are meant mostly to be done in order, with examples building on previous concepts.

Time (min)	Description	Module
30-45	Setting up Feast projects & CI/CD + powering batch predictions	Module 0
15-20	Streaming ingestion & online feature retrieval with Kafka, Spark, Redis	Module 1
10-15	Real-time feature engineering with on demand transformations	Module 2
TBD	Feature server deployment (embed, as a service, AWS Lambda)	TBD
TBD	Versioning features / models in Feast	TBD
TBD	Data quality monitoring in Feast	TBD
TBD	Batch transformations	TBD
TBD	Stream transformations	TBD

A workshop with several modules to help learn Feast, an open-source feature store

Related tags

Overview

Workshop: Learning Feast

What is Feast?

Why Feast?

Pre-requisites

Modules

Owner

Feast

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

German Text-To-Speech Engine using Tacotron and Griffin-Lim

BookNLP, a natural language processing pipeline for books

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Lumped-element impedance calculator and frequency-domain plotter.

A Fast Sequence Transducer Implementation with PyTorch Bindings

This is Assignment1 code for the Web Data Processing System.

All the code I wrote for Overwatch-related projects that I still own the rights to.

Задания КЕГЭ по информатике 2021 на Python

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

A curated list of efficient attention modules

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

A paper list of pre-trained language models (PLMs).

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）