Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.

OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality v

OpenShot Studios, LLC 3.1k Jan 01, 2023
Rembg Video Virtual Green Screen Edition

Rembg Virtual Greenscreen Edition is a tool to create a green screen matte for videos

Tim Scarfe 217 Jan 06, 2023
Splat a video into a mosaic by sampling a frame at regular intervals

Splat a video into a mosaic by sampling a frame at regular intervals. Useful for seeing the changes over time of an entire video or movie.

Ryan Fox 4 Oct 16, 2022
video streaming userbot (vsu) based on pytgcalls for streaming video trought the telegram video chat group.

VIDEO STREAM USERBOT ✨ an another telegram userbot for streaming video trought the telegram video chat. Environmental Variables 📌 API_ID : Get this v

levina 6 Oct 17, 2021
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
Video processing routines for SciPy

scikit-video Video Processing SciKit BETA Video processing algorithms, including I/O, quality metrics, temporal filtering, motion/object detection, mo

Alex Izvorski 119 Dec 27, 2022
Webcam Indicator is an application to recieve and send messages from your own Webcam Server.

Welcome to Webcam Indicator 👋 Webcam Indicator is an application to recieve and send messages from your own Webcam Server. 🏠 Homepage Prerequisites

Lorenzo Carbonell 2 Apr 04, 2022
PyAV is a Pythonic binding for the FFmpeg libraries.

PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gritty details as much as possible.

PyAV 1.8k Jan 01, 2023
Extracting frames from video and create video using frames

Extracting frames from video and create video using frames This program uses opencv library to extract the frames from video and create video from ext

1 Nov 19, 2021
An easy to use GUI based video to image sequence converter (and vice versa).

Vdo & Img Conversion Tools This is a quick conversion tool made with python that can save you a lot of time. With this tool you can extract image sequ

Akash Bora 3 Sep 18, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
Create a Video Membership app using FastAPI & NoSQL

Video Membership Create a Video Membership app using FastAPI & NoSQL. In this series, we're going to explore building a membership application using F

Coding For Entrepreneurs 69 Dec 25, 2022
Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2

VideoStreamingServer Completed: Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2 In progress: Change the transmission Pro

1 Dec 06, 2021
GStreamer Inspector GUI

gst-explorer GStreamer GUI Interface Tool GUI interface for inspecting GStreamer Plugins, Elements and Type Finders. Expects Python3 Qt, PyQt5 and GSt

Jetsonhacks 31 Nov 29, 2022
Techie Sneh 17 Nov 23, 2021
Simple VLC-based media player that can play multiple videos at the same time

Screenshots About Simple VLC-based media player that can play multiple videos at the same time. You can play as many videos as you like, the only limi

161 Jan 05, 2023
A web RTSP play platform based on websocket and tornado, websocket use blob binaryType read as ArrayBuffer

A web RTSP play platform based on websocket and tornado, websocket use blob binaryType read as ArrayBuffer

2 Feb 25, 2022
This application makes a webrtc video call with jitsi meet signaling

gstreamer-jitsi-meet This application makes a webrtc video call with jitsi meet signaling. Other end can be any jitsi meet app or web app. It doesn't

Linh 7 Apr 26, 2022
A telegram bot for compressing/encoding videos in h264 format.

Video-Encoder-Bot a telegram bot for compressing/encoding videos in h264 format. Configuration Add values in environment variables or add them in conf

Weeb >.< 61 Dec 29, 2022
Jio TV Server - Watch TV right from your laptop

Jio-PyServer Jio TV - Python Server Watch TV right from your laptop! Requirements: Python 3.X Internet Access A Jio Account Known Issues: Channel Stre

Elvis Tony 11 Apr 05, 2022