whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

Grover is a model for Neural Fake News -- both generation and detectio

Tool to check whether a GCP bucket is public or not.

Ecommerce product title recognition package

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

MRC approach for Aspect-based Sentiment Analysis (ABSA)

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

The official repository of the ISBI 2022 KNIGHT Challenge

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Entity Disambiguation as text extraction (ACL 2022)

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Deep Learning Topics with Computer Vision & NLP

Korea Spell Checker

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

基于Transformer的单模型、多尺度的VAE模型

中文生成式预训练模型

translate using your voice

Fast topic modeling platform

Beyond Accuracy: Behavioral Testing of NLP models with CheckList