Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Maintained wavelink fork for pycord

Pycord.Wavelink Wavelink is robust and powerful Lavalink wrapper for Pycord! Wavelink features a fully asynchronous API that's intuitive and easy to u

Pycord Development 23 Dec 11, 2022
universal messaging & notifications api

Pronounced "boat-shahft" What is botschaft? Botschaft is unified messaging & notifications appliance. Want to text yourself when a long-running task c

Tyler M. Kontra 25 Aug 16, 2022
This repository contains free labs for setting up an entire workflow and DevOps environment from a real-world perspective in AWS

DevOps-The-Hard-Way-AWS This tutorial contains a full, real-world solution for setting up an environment that is using DevOps technologies and practic

Mike Levan 1.6k Jan 05, 2023
Infinity: a Twitter retweet bot that can be used by anyone

INSTAMATE Requires Firefox Instapy Python3 How To Use? Fork the repository Add your credentials in the bot.py file Save commits Clone your fork cd int

unofficialdxnny 3 Jun 23, 2022
This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database. Its original intention is to monitor cryptocurrency related ch

Greg 3 Jun 07, 2022
Example app to be deployed to AWS as an API Gateway / Lambda Stack

Disclaimer I won't answer issues or emails regarding the project anymore. The project is old and not maintained anymore. I'm not sure if it still work

Ben 123 Jan 01, 2023
Unofficial GoPro API Library for Python - connect to GoPro via WiFi.

GoPro API for Python Unofficial GoPro API Library for Python - connect to GoPro cameras via WiFi. Compatibility: HERO3 HERO3+ HERO4 (including HERO Se

Konrad Iturbe 1.3k Jan 01, 2023
Analog clock that shows the weather instead of the actual numerical hour it points to.

Eli's weatherClock An digital analog clock but instead of showing the hours, the clock shows the weather at that hour of the day. So instead of showin

Kovin 154 Dec 01, 2022
A heraldry-related bot, designed for the Heraldry Community.

Heraldtron A heraldry-related bot, designed for the Heraldry Community. Requirements Python 3.9+ discord.py aiohttp (comes installed with discord.py)

1 Mar 31, 2022
MSE5050/7050 Materials Informatics course at the University of Utah

MaterialsInformatics MSE5050/7050 Materials Informatics course at the University of Utah This github repo contains coursework content such as class sl

41 Dec 30, 2022
📈 A Discord bot for displaying the download stats of a repository made with Python, the Hikari API and PostgreSQL.

📈 axyl-stats axyl-stats is a Discord bot made with Python (with the Hikari API wrapper) and PostgreSQL, used as a download counter for a GitHub repo.

Angelo-F 2 May 14, 2022
BroBot's files, code and tester.

README - BroBOT Made by Rohan Chaturvedi [email protected] DISCLAIMER: Th

1 Jan 09, 2022
Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting kicked for inactivity.

New World - AFK Written in Python, freezed into stand-alone executable with PyInstaller. This app will make sure you stay in New World without getting

Rodney 5 Oct 31, 2021
Unofficial Discord Rich Presence for HackTheBox platform

HTBRichPresence Unofficial Discord Rich Presence for HackTheBox platform The project is under lazy development. How to run Install requirements: // I'

Antonio 4 Apr 19, 2022
Photogrammetry Web API

OpenScanCloud Photogrammetry Web API Overview / Outline: The OpenScan Cloud is intended to be a decentralized, open and free photogrammetry web API. T

Thomas 86 Jan 05, 2023
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

Market Sentiment 345 Dec 17, 2022
Block Telegram's new

Telegram Channel Blocker Bot Channel go away! This bot is used to delete and ban message sent by channel How this appears? The reason this appears ple

16 Feb 15, 2022
Quickly visualize docker networks with graphviz.

Docker Network Graph Visualize the relationship between Docker networks and containers as a neat graphviz graph. Example Usage usage: docker-net-graph

Leo Verto 43 Dec 12, 2022
An Amazon Price Tracker app helps you to buy which product you want within sale price by sending an E-Mail.

Amazon Price Tracker An Amazon Price Tracker app helps you to buy which product you want within sale price by sending an E-Mail. Installing Download t

Aytaç Kaşoğlu 2 Feb 10, 2022
Use Node JS Keywords In Python!!!

Use Node JS Keywords In Python!!!

Sancho Godinho 1 Oct 23, 2021