Free Data Engineering course!

Overview

Data Engineering Zoomcamp

Syllabus

Taking the course

Self-paced mode

All the materials of the course are freely available, so you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

2022 Cohort

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Syllabus

Week 1: Introduction & Prerequisites

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

Week 2: Data ingestion

  • Data Lake
  • Workflow orchestration
  • Setting up Airflow locally
  • Ingesting data to GCP with Airflow
  • Ingesting data to local Postgres with Airflow
  • Moving data from AWS to GCP (Transfer service)
  • Homework

More details

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitoning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning

More details

Week 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualising the data with google data studio and metabase

More details

Week 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

Week 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Week 7, 8 & 9: Project

Putting everything we learned to practice

  • Week 7 and 8: working on your own project
  • Week 9: reviewing your peers

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Airflow: Pipeline Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Tools

For this course you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

See Week 1 for more details about installing these tools

FAQ

  • Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
  • Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
  • Q: Will there be a certificate? A: Yes, if you complete the project
  • Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
  • Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)
  • Q: I'm stuck! I've got a technical question! A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Owner
DataTalksClub
The place to talk about data
DataTalksClub
Retrieve bank transactions and categorize for budgeting use

Budgeting After trying out some budgeting software, I decided to make my own. selenium_scraper Using the selenium package, this script runs an instanc

Marc 1 Nov 10, 2021
Unofficial Python implementation of the DNMF overlapping community detection algorithm

DNMF Unofficial Python implementation of the Discrete Non-negative Matrix Factorization (DNMF) overlapping community detection algorithm Paper Ye, Fan

Andrej Janchevski 3 Nov 30, 2021
The program converts Swiss notes into American notes

Informatik-Programmieren Einleitung: Das Programm rechnet Schweizer Noten in das Amerikanische Noten um. Der Benutzer kann seine Note eingeben und der

2 Dec 16, 2021
run-js Goal: The Easiest Way to Run JavaScript in Python

run-js Goal: The Easiest Way to Run JavaScript in Python features Stateless Async JS Functions No Intermediary Files Functional Programming CommonJS a

Daniel J. Dufour 9 Aug 16, 2022
PIP VA TASHQI KUTUBXONALAR

39-dars PIP VA TASHQI KUTUBXONALAR KIRISH Avvalgi darsimizda Python bilan birga o'rnatluvchi, standart kutubxona va undagi ba'zi foydali modullar bila

Sayfiddin 3 Nov 25, 2021
Sheet2export - FreeCAD macro to export spreadsheet

Description This is FreeCAD macro to export spreadsheet to file.

Darek L 3 Jul 09, 2022
UUID_ApiGenerator - This an API that will return a key-value pair of randomly generated UUID

This an API that will return a key-value pair of randomly generated UUID. Key will be a timestamp and value will be UUID. While the

1 Jan 28, 2022
Python 100daysofcode

#python #100daysofcode Python is a simple, general purpose ,high level & object-oriented programming language even it's is interpreted scripting langu

Tara 1 Feb 10, 2022
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

High-Performance and Automatic Computing 151 Dec 24, 2022
MDAnalysis tool to calculate membrane curvature.

The MDAkit for membrane curvature analysis is part of the Google Summer of Code program and it is linked to a Code of Conduct.

MDAnalysis 19 Oct 20, 2022
List of resources for learning Category Theory

A curated list of resources for studying category theory. As resources aimed at mathematicians are abundant, this list is aimed at materials whose target audience is not people with a graduate-level

Bruno Gavranović 100 Jan 01, 2023
[x]it! support for working with todo and check list files in Sublime Text

[x]it! for Sublime Text This Sublime Package provides syntax-highlighting, shortcuts, and auto-completions for [x]it! files. Features Syntax highlight

Jan Heuermann 18 Sep 19, 2022
redun aims to be a more expressive and efficient workflow framework

redun yet another redundant workflow engine redun aims to be a more expressive and efficient workflow framework, built on top of the popular Python pr

insitro 372 Jan 04, 2023
A price calculator for multiple things

Price Calculator A price calculator for multiple things Example I have 0.0567kg diamond. The price of diamond in kg is: $4500. Then it says: The price

Abel 1 Nov 26, 2021
Holographic Declarative Memory for Python ACT-R

HDM This is the repository for the Holographic Declarative Memory (HDM) module for Python ACT-R. This repository contains: documentation: a paper, con

Carleton Cognitive Modeling Lab 1 Jan 17, 2022
Plugin to generate BOM + CPL files for JLCPCB

KiCAD JLCPCB tools Plugin to generate all files necessary for JLCPCB board fabrication and assembly Gerber files Excellon files BOM file CPL file Furt

bouni 566 Dec 29, 2022
a simple proof system I made to learn math without any mistakes

math_up a simple proof system I made to learn math without any mistakes 0. Short Introduction test yourself, enjoy your math! math_up is an NBG-based,

양현우 5 Jun 04, 2021
Adam with minor modifications which give significant improvement

BAdam Modification of Adam [1] optimizer with increased stability and better performance. Tricks used: Decoupled weight decay as in AdamW [2]. Such de

19 May 11, 2022
Some basic sorting algos

Sorting-Algos Some basic sorting algos HacktoberFest 2021 This repository consists of mezzo-level projects that undertake a simple task and perform it

Manthan Ghasadiya 7 Dec 13, 2022
El_Binario - A converter for Binary, Decimal, Hexadecimal and Octal numbers

El_Binario El_Binario es un conversor de números Binarios, Decimales, Hexadecima

2 Jan 28, 2022