Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
A wrapper around ffmpeg to make it work in a concurrent and memory-buffered fashion.

Media Fixer Have you ever had a film or TV show that your TV wasn't able to play its audio? Well this program is for you. Media Fixer is a program whi

Halit Şimşek 3 May 04, 2022
Simple background blur for your webcam

backgroundblur Simple background blur for your webcam. This script will capture your webcams output, add a blur effect to the background and output th

Stefan Wagner 4 Dec 07, 2021
Search a video semantically with AI.

Which Frame? Search a video semantically with AI. For example, try a natural language search query like "a person with sunglasses". You can also searc

David Chuan-En Lin 1 Nov 06, 2021
Telegram Video Stream

Video Stream An Advanced VC Video Player created for playing video in the voice chats of Telegram Groups And Channel Configs TOKEN - Get bot token fro

mr_lokaman 46 Dec 25, 2022
This is a simple script to generate a .opml file from a list of youtube channels.

Youtube to rss Don't spend more time than you need to on youtube.com This is a simple script to generate a .opml file from a list of youtube channels.

Kostas 1 Oct 04, 2022
A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups

Yui Vidio Player A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups Demo Setting up Add this Bot t

Achu biju 32 Sep 16, 2021
Stream music with ffmpeg and python

youtube-stream Stream music with ffmpeg and python original Usage set the KEY in stream.sh run server.py run stream.sh (You can use Git bash or WSL in

Giyoung Ryu 14 Nov 17, 2021
Python bindings for FFmpeg - with complex filtering support

ffmpeg-python: Python bindings for FFmpeg Overview There are tons of Python FFmpeg wrappers out there but they seem to lack complex filter support. ff

Karl Kroening 7.7k Jan 03, 2023
Wonkey - an open source programming language for the creation of cross-platform video games

Wonkey Programming Language Wonkey is an open source programming language for the creation of cross-platform video games, highly inspired by the “Blit

Wonkey Coders 110 Nov 09, 2022
Become a virtual character with just your webcam!

Become a virtual character with just your webcam!

Rich 300 Jan 03, 2023
I have baked a custom integration to control Eufy Security Cameras and access RTSP and P2P stream if possible.

I have baked a custom integration to control Eufy Security Cameras and access RTSP (real time streaming protocol) and P2P (peer to peer) stream if pos

Fuat Akgün 422 Jan 01, 2023
Youtube-dislikes-adder - Add dislikes to the description of your YouTube videos.

Add number of dislikes to the description of your YouTube videos. Number of dislikes are updated if you let this function as a bot.

fluks 1 Aug 23, 2022
KonomiTV: Kind and Optimized Next brOadcast watching systeM Infrastructure for TV

備考・注意事項 現在 α 版で、まだ実験的なプロダクトです。通常利用には耐えないでしょうし、サポートもできません。 安定しているとは到底言いがたい品質ですが、それでも構わない方のみ導入してください。 使い方などの説明も用意できていないため、自力でトラブルに対処できるエンジニアの方以外には現状おすすめ

tsukumi 244 Dec 31, 2022
pyffstream - A CLI frontend for streaming over SRT and RTMP specializing in sending off files

pyffstream - A CLI frontend for streaming over SRT and RTMP specializing in sending off files

Gregory Beauregard 3 Mar 04, 2022
Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2

VideoStreamingServer Completed: Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2 In progress: Change the transmission Pro

1 Dec 06, 2021
Play Video & Music on Telegram Group Video Chat

🖤 DEMONGIRL 🖤 ʜᴇʟʟᴏ ❤️ 🇱🇰 Join us ᴠɪᴅᴇᴏ sᴛʀᴇᴀᴍ ɪs ᴀɴ ᴀᴅᴠᴀɴᴄᴇᴅ ᴛᴇʟᴇʀᴀᴍ ʙᴏᴛ ᴛʜᴀᴛ's ᴀʟʟᴏᴡ ʏᴏᴜ ᴛᴏ ᴘʟᴀʏ ᴠɪᴅᴇᴏ & ᴍᴜsɪᴄ ᴏɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴠɪᴅᴇᴏ ᴄʜᴀᴛ 🧪 ɢ

Jonathan 5 Dec 31, 2021
Automatic video generator for local news

Automatic video generator for local news

Gabriel Monteiro 2 Jan 11, 2022
Telegram Video Chat Video Streaming bot 🇱🇰

🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re

DOOZY YEZ 5 Jun 26, 2022
Uncompress DEFLATE streams in pure Python

stream-deflate Uncompress DEFLATE streams in pure Python. Work in progress. This README serves as a rough design spec. Installation pip install stream

Michal Charemza 7 Oct 13, 2022
iYTDL - Asynchronous Standalone Inline YouTube-DL Module

iYTDL Asynchronous Standalone Inline YouTube-DL Module ⬇️ Installing Install pip3 install iytdl Upgrade pip3 install -U iytdl ⭐️ Features Fully Asynch

iYTDL 46 Dec 24, 2022