Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

GTPS Status Bot

Python GTPS Status Bot (BETA) Python GTPS Status Bot Require Python How To Use Download This Source Extract The Zip File Install the requirements (Mod

Lamp 4 Oct 11, 2021
Die wichtigsten APIs Deutschlands in einem Python Paket.

Deutschland A python package that gives you easy access to the most valuable datasets of Germany. Installation pip install deutschland Geographic data

Bundesstelle für Open Data 921 Jan 08, 2023
A simple tool which automate commands of discord economy bots

A simple tool which automate commands of discord economy bots. Fully configurable using an intuitive configuration made in YAML

SkydenFly 5 Sep 18, 2022
Telegram Bot for saving and sharing personal and group notes

EZ Notes Bot (ezNotesBot) Telegram Bot for saving and sharing personal and group notes. Usage Personal notes: reply to any message in PM to save it as

Dash Eclipse 8 Nov 07, 2022
This python cheat utilizes PyMeow, PyMem, and others to enhance your CS:GO experience ;).

CSGO-Python-Cheat This python cheat utilizes PyMeow, PyMem, and others to enhance your CS:GO experience ;). Features Esp Tracers Chams (More to come)

Addi 1 Nov 30, 2021
PyLyrics Is An [Open-Source] Bot That Can Help You Get Song Lyrics

PyLyrics-Bot Telegram Bot To Search Song Lyrics From Genuis. 🤖 Demo: 👨‍💻 Deploy: ❤ Deploy Your Own Bot : Star 🌟 Fork 🍴 & Deploy -Easy Way -Self-h

DAMIEN 12 Nov 12, 2022
A simple Telegram bot, written in Python, that you can use to shill (i.e. send messages) your token, or whatever, to channels.

Telegram Shill Bot Ever wanted a Shill Bot but wankers keep scamming for one OR wanted to charge you an arm and a leg? This is a simple bot written in

53 Nov 25, 2022
This is Telegram Files Store Bot by @AbirHasan2005

PyroFilesStoreBot This is Telegram Parmanent Files Store Bot by @AbirHasan2005. Language: Python3 Library: Pyrogram Features: In PM Just Forward or Se

Abir Hasan 168 Dec 19, 2022
修改自SharpNoPSExec的基于python的横移工具 A Lateral Movement Tool Learned From SharpNoPSExec -- Twitter: @juliourena

PyNoPSExec A Lateral Movement Tool Learned From SharpNoPSExec -- Twitter: @juliourena 根据@juliourena大神的SharpNOPsExec项目改写的横向移动工具 Platform(平台): Windows 1

<a href=[email protected]"> 23 Nov 09, 2022
Twitter bot code can be found in twitterBotAPI.py

NN Twitter Bot This github repository is BASED and is yanderedev levels of spaghetti Neural net code can be found in alexnet.py. Despite the name, it

167 Dec 19, 2022
The accompanying code for the paper "GMAT: Global Memory Augmentation for Transformers" (Ankit Gupta and Jonathan Berant).

GMAT: Global Memory Augmentation for Transformers This repository contains the accompanying code for the paper: "GMAT: Global Memory Augmentation for

Ankit Gupta 7 Oct 21, 2021
A (probably) working Kik name checker

KikNameChecker !THIS ONLY CHECKS WS2.KIK.COM ENDPOINT! \ Will add user inputted endpoints thing \ A (probably) working Kik name checker Started as a s

insert edgy and cool name 1 Dec 17, 2022
Represents a Lavalink client used to manage nodes and connections.

lavaplayer Represents a Lavalink client used to manage nodes and connections. setup pip install lavaplayer setup lavalink you need to java 11* LTS or

HazemMeqdad 37 Nov 21, 2022
Add Reactions to your Channel Posts!

• Shaaak - Post Reaction Bot Simple and Minimalistic telegram bot to add Reactions and Comments to your Channel Posts! - What's Unique About it?

Harsh Raj 4 Jan 31, 2022
a script to bulk check usernames on multiple site. includes proxy & threading support.

linked-bulk-checker bulk checks username availability on multiple sites info people have been selling these so i just made one to release dm my discor

krul 9 Sep 20, 2021
Minecraft name sniper written in python.

⚠️ IMPORTANT ⚠️ DO NOT USE MCSNIPERPY -- READ BELOW This sniper does not support Microsoft accounts or prename / gc sniping and is MUCH harder to use

MCsniperPY 201 Dec 30, 2022
ToqueIO Nuke tools - A collection of tools designed to assist in enhancing your workflows within nuke

ToqueIO Nuke tools - A collection of tools designed to assist in enhancing your workflows within nuke

4 Feb 19, 2022
AWS Lambda Fast API starter application

AWS Lambda Fast API Fast API starter application compatible with API Gateway and Lambda Function. How to deploy it? Terraform AWS Lambda API is a reus

OBytes 6 Apr 20, 2022
Creates Spotify playlists from Spinitron playlists.

spin2spot Creates Spotify playlists from Spinitron playlists. Quick Start You can use spin2spot as a command-line tool: Erik Didriksen 1 Aug 28, 2021

Auxiliator is telegram bot for basic web-application analysis

Auxiliator Auxiliator is telegram bot for basic web-application analysis What for? Sometimes there is no access to your main PC, where you can scan we

Revoltage 13 Dec 26, 2021