(Day 342) Apache Flink 101 + Data Quality

Ivan Ivanov · December 8, 2024

Hello :) Today is Day 342!

A quick summary of today:

  • apache flink 101 by confluent
  • two more homework As
  • DQ basics

Today I found more free courses on streaming by Confluent here

The course I covered yesterday (ApacheKafka 101) was the 1st course in the series, and today I covered the 2nd - Apache Flink 101

The overview:

  • what Apache Flink is, and why you might use it
  • what stream processing is, and how it differs from batch processing
  • Flink’s runtime architecture
  • how to use Flink and Kafka together
  • how to use Flink SQL: tables, windows, event time, watermarks, and more
  • stateful stream processing
  • how watermarks support event time operations
  • how Flink uses snapshots (checkpoints) for fault tolerance

The course containes short videos with very well-made explanations and also short exercises where I got to run Flink locally, and play around with FlinkSQL

Submitted next two homeworks for Zach Wilson’s YT bootcamp

image

  • 3rd - Spark queries (joins, aggregations)
  • 4th - converting 2 SQL queries to SparkSQL/Dataframe API jobs and writing a chispa test for them

I already had the homework finished but I got a B at first because I forgot the include some parts of my code as I was doing dev in a notebook, but did not add the code to the submission .py file. Anyway got A for both 🥳

Some DQ basics from Zach Wilson’s latest lecture

Gold pipelines MIDAS P1 Gold pipelines MIDAS P2 Gold pipelines MIDAS P3


After stream, I wanted to finish a report I am writing for the company reviewer LLM I am fine-tuning for my professor. I think I will share the full one tomorrow :)

That is all for today!

See you tomorrow :)