Prerequisites

To get the most out of this course, you should have:

  • Basic coding experience
  • Familiarity with SQL
  • Experience with Python (helpful but not required)

  • No prior data engineering experience is necessary.


    Modules

    #### [Module 1: Containerization and Infrastructure as Code](01-docker-terraform/)

  • Introduction to Azure
  • Docker and Docker Compose
  • Running PostgreSQL with Docker
  • Infrastructure setup with Terraform

  • #### [Module 2: Workflow Orchestration](02-workflow-orchestration/) - Data Lakes and Workflow Orchestration - Workflow orchestration with Kestra - Homework #### [Workshop 1: Data Ingestion](cohorts/2025/workshops/dlt/README.md) - API reading and pipeline scalability - Data normalization and incremental loading - Homework #### [Module 3: Data Warehousing](03-data-warehouse/) - Introduction to BigQuery - Partitioning, clustering, and best practices - Machine learning in BigQuery #### [Module 4: Analytics Engineering](04-analytics-engineering/) - dbt (data build tool) with PostgreSQL & BigQuery - Testing, documentation, and deployment - Data visualization with Metabase #### [Module 5: Batch Processing](05-batch/) - Introduction to Apache Spark - DataFrames and SQL - Internals of GroupBy and Joins #### [Module 6: Streaming](06-streaming/) - Introduction to Kafka - Kafka Streams and KSQL - Schema management with Avro #### [Final Project](projects/) - Apply all concepts learned in a real-world scenario - Peer review and feedback process

    Data Engineering

    This is a online course delivered on zoom for data engineering * [Module 1: Introduction](#module-1-introduction)

    * [Module 2: Containerization and Infrastructure as Code](#module-2-containerization-and-infrastructure-as-code)

  • Module 3 : Workflow Orchestration



  • Module 4: Data Warehouse