Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Boltstream Live Video Streaming Website + Backend

Boltstream Self-hosted Live Video Streaming Website + Backend Reference

Ben Wilber 1.7k Dec 28, 2022
Automatically logs into VTOP and can perform certain tasks

VTOP_Login Automatically logs into VTOP and can perform certain tasks To run the

Jatin 1 Jan 30, 2022
Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3

Docker container to expose a local RTMP, RTSP, and HLS stream for all your Wyze cameras including v3. No Third-party or special firmware required.

1.2k Jan 07, 2023
camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

camKapture is an open source application that allows users to access their webcam device and take pictures or create videos.

manoj 1 Jun 21, 2022
Automatically segment in-video YouTube sponsorships.

SponsorBlock Auto Segment [Model Download] Automatically segment in-video YouTube sponsorships. Trained on a large dataset of YouTube sponsor transcri

Akmal 7 Aug 22, 2022
Скрипт который выводит видео в консоль. Ничего лишнего)

video-to-ascii Скрипт который выводит видео в консоль. Ничего лишнего) Требования Минимальное разрешение экрана: 1280x720 Видео в качестве 360p 10-45f

Daniil Pisarev 155 Nov 28, 2022
Stream-Cli application that allow you to play your favorite movies from the terminal

Stream-Cli application that allow you to play your favorite movies from the terminal

redouane 380 Jan 08, 2023
Extracting frames from video and create video using frames

Extracting frames from video and create video using frames This program uses opencv library to extract the frames from video and create video from ext

1 Nov 19, 2021
A Telegram bot to convert videos into x265/x264 format via ffmpeg.

Video Encoder Bot A Telegram bot to convert videos into x265/x264 format via ffmpeg. Configuration Add values in environment variables or add them in

Adnan Ahmad 82 Jan 03, 2023
All the code in these repos was created and explained by HashLips on the main YouTube channel.

Welcome to HashLips 👄 All the code in these repos was created and explained by HashLips on the main YouTube channel. To find out more please visit: ?

HashLips 6.7k Jan 06, 2023
A script to disable steam servers regionwise. [Works on Windows only]

Csgo-server-blocker A script to disable steam servers regionwise. [Works on Windows only] Dependencies python3.x Usage: pip install requirements.txt I

Aditya Bennur 2 Jun 10, 2022
Youtube as covert-channel - Control systems remotely and execute commands by uploading videos to Youtube

covert-tube A program to control systems remotely by uploading videos to Youtube using Python to create the videos and the listener, emulating some ma

Ricardo Ruiz 101 Nov 01, 2022
Python program - to extract slides from videos

Programa em Python - que fiz em algumas horas e que provavelmente tem bugs - para extrair slides de vídeos.

Natanael Antonioli 45 Nov 12, 2022
Python Simple Mass Video Clipper (PSMVC)

Python Simple Mass Video Clipper (PSMVC) PSMVC é um gerador de cortes via terminal construído em python. Uso Basta abrir o arquivo start.py Dependenci

Bruno 2 Oct 16, 2021
Stream anime from kaa.si with python

kaa.si-cli Stream anime using MPV player from kaa.si with python

Muhammad Rovino Sanjaya 52 Dec 24, 2022
Video Translation Into Text

2021/12/9 The project has been updated Added a home screen Just drag it onto the screen The final results \ 2021/12/9 项目已更新 添加了主界面 拖到即可 最后结果 \ Using t

10 Mar 12, 2022
KonomiTV: Kind and Optimized Next brOadcast watching systeM Infrastructure for TV

備考・注意事項 現在 α 版で、まだ実験的なプロダクトです。通常利用には耐えないでしょうし、サポートもできません。 安定しているとは到底言いがたい品質ですが、それでも構わない方のみ導入してください。 使い方などの説明も用意できていないため、自力でトラブルに対処できるエンジニアの方以外には現状おすすめ

tsukumi 244 Dec 31, 2022
A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video

Video-subtitle-merger A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video Features Force Sub Button Added Soon Support Media Type Such

6 Dec 31, 2021
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
Cross-platform command-line AV1 / VP9 / HEVC / H264 encoding framework with per scene quality encoding

Av1an A cross-platform framework to streamline encoding Easy, Fast, Efficient and Feature Rich An easy way to start using AV1 / HEVC / H264 / VP9 / VP

Zen 947 Jan 01, 2023