A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

pipeline for migrating lichess data into postgresql

LynxKite: a complete graph data science platform for very large graphs and other datasets.

A crude Hy handle on Pandas library

The micro-framework to create dataframes from functions.

A DSL for data-driven computational pipelines

Analysis scripts for QG equations

First steps with Python in Life Sciences

Collections of pydantic models

ETL flow framework based on Yaml configs in Python

A simple and efficient tool to parallelize Pandas operations on all available CPUs

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

Analyzing Covid-19 Outbreaks in Ontario

Handle, manipulate, and convert data with units in Python

Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Picka: A Python module for data generation and randomization.

TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI