Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Overview

time-series-kafka-demo

img

Mock stream producer for time series data using Kafka.

I walk through this tutorial and others here on GitHub and on my Medium blog. Here is a friend link for open access to the article on Towards Data Science: Make a mock “real-time” data stream with Python and Kafka. I'll always add friend links on my GitHub tutorials for free Medium access if you don't have a paid Medium membership (referral link).

If you find any of this useful, I always appreciate contributions to my Saturday morning fancy coffee fund!

This repo demos how to convert a csv file of timestamped data into a real-time stream useful for testing streaming analytics. An example input file with random time series data and a script for generating the file are included in the data directory.

The producer and consumer Python scripts use Confluent's Kafka client for Python, which is installed in the Docker image built with the accompanying Dockerfile, if you choose to use it.

Requires Docker and Docker Compose.

Usage

Clone repo and cd into directory.

git clone https://github.com/mtpatter/time-series-kafka-demo.git
cd time-series-kafka-demo

Start the Kafka broker

docker compose up --build

Build a Docker image (optionally, for the producer and consumer)

From the main root directory:

docker build -t "kafkacsv" .

If you want to use Docker for the python scripts, this should now work:

docker run -it --rm kafkacsv python bin/sendStream.py -h

Start a consumer

To start a consumer for printing all messages in real-time from the stream "my-stream":

python bin/processStream.py my-stream

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/processStream.py my-stream

Produce a time series stream

Send time series from data/data.csv to topic “my-stream”, and speed it up by a factor of 10.

python bin/sendStream.py data/data.csv my-stream --speed 10

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/sendStream.py data/data.csv my-stream --speed 10

Shut down and clean up

Stop the consumer with Return and Ctrl+C.

Shutdown Kafka broker system:

docker compose down
VSCode extension that generates docstrings for python files

VSCode Python Docstring Generator Visual Studio Code extension to quickly generate docstrings for python functions. Features Quickly generate a docstr

Nils Werner 506 Jan 03, 2023
Mkdocs obsidian publish - Publish your obsidian vault through a python script

Mkdocs Obsidian Mkdocs Obsidian is an association between a python script and a

Mara 49 Jan 09, 2023
Official Matplotlib cheat sheets

Official Matplotlib cheat sheets

Matplotlib Developers 6.7k Jan 09, 2023
Visualizacao-dados-dell - Repositório com as atividades desenvolvidas no curso de Visualização de Dados

📚 Descrição Neste curso da Dell trabalhamos com a visualização de dados. 🖥️ Aulas 1.1 - Explorando conjuntos de dados 1.2 - Fundamentos de visualiza

Claudia dos Anjos 1 Dec 28, 2021
A comprehensive and FREE Online Python Development tutorial going step-by-step into the world of Python.

FREE Reverse Engineering Self-Study Course HERE Fundamental Python The book and code repo for the FREE Fundamental Python book by Kevin Thomas. FREE B

Kevin Thomas 7 Mar 19, 2022
Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02

Mini-project report Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02 Introduction This project was

1 Nov 08, 2021
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.

Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:

Kazi Amit Hasan 38 Nov 24, 2022
Data science on SDGs - Udemy Online Course Material: Data Science on Sustainable Development Goals

Data Science on Sustainable Development Goals (SDGs) Udemy Online Course Material: Data Science on Sustainable Development Goals https://bit.ly/data_s

Frank Kienle 1 Jan 04, 2022
A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

PYTHON WORLD 12.1k Jan 05, 2023
Grokking the Object Oriented Design Interview

Grokking the Object Oriented Design Interview

Tusamma Sal Sabil 2.6k Jan 08, 2023
Feature Store for Machine Learning

Overview Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and

Feast 3.8k Dec 30, 2022
Poetry plugin to export the dependencies to various formats

Poetry export plugin This package is a plugin that allows the export of locked packages to various formats. Note: For now, only the requirements.txt f

Poetry 90 Jan 05, 2023
The blazing-fast Discord bot.

Wavy Wavy is an open-source multipurpose Discord bot built with pycord. Wavy is still in development, so use it at your own risk. Tools and services u

Wavy 7 Dec 27, 2022
NetBox plugin for BGP related objects documentation

Netbox BGP Plugin Netbox plugin for BGP related objects documentation. Compatibility This plugin in compatible with NetBox 2.10 and later. Installatio

Nikolay Yuzefovich 133 Dec 27, 2022
Reproducible Data Science at Scale!

Pachyderm: The Data Foundation for Machine Learning Pachyderm provides the data layer that allows machine learning teams to productionize and scale th

Pachyderm 5.7k Dec 29, 2022
An introduction to hikari, complete with different examples for different command handlers.

An intro to hikari This repo provides some simple examples to get you started with hikari. Contained in this repo are bots designed with both the hika

Ethan Henderson 18 Nov 29, 2022
A Python library that simplifies the extraction of datasets from XML content.

xmldataset: simple xml parsing 🗃️ XML Dataset: simple xml parsing Documentation: https://xmldataset.readthedocs.io A Python library that simplifies t

James Spurin 75 Dec 30, 2022
[Unofficial] Python PEP in EPUB format

PEPs in EPUB format This is a unofficial repository where I stock all valid PEPs in the EPUB format. Repository Cloning git clone --recursive Mickaël Schoentgen 9 Oct 12, 2022

A collection of online resources to help you on your Tech journey.

Everything Tech Resources & Projects About The Project Coming from an engineering background and looking to up skill yourself on a new field can be di

Mohamed A 396 Dec 31, 2022
Bring RGB to life in Neovim

Bring RGB to life in Neovim Change your RGB devices' color depending on Neovim's mode. Fast and asynchronous plugin to live your vim-life to the fulle

Antoine 40 Oct 27, 2022