(Day 348) Apache Flink setup and debugging + starting the book Effective Machine Learning Teams

Ivan Ivanov · December 14, 2024

Hello :) Today is Day 348!

A quick summary of today:

doing setup for Zach Wilson’s week 4 - streaming with kafka and flink
starting an amazing new book - Effective Machine Learning Teams

Apache flink week setup for Zach Wilson’s camp

Tomorrow when I wake up the lectures + labs will be out but today he released the setup. Thankfully I did it without any hurdles.

I even decided to look into an extra aggregation job that shows number of hits every page has every 5 mins on Zach’s website. It was weird because the job submission was alright - no errors in the docker logs, neither in the flink UI. But I had no agg data in my table. After some playing around I believe the problem was in this function connecting to kafka:

I narrowed it down to two changes in the aggregation_job.py file in the create_processed_events_source_kafka function:

change the pattern to pattern = "yyyy-MM-dd''T''HH:mm:ss.SSS''Z''"
change the create table part of the sink_ddl to:

CREATE TABLE {table_name} (
            url VARCHAR,
            referrer VARCHAR,
            user_agent VARCHAR,
            host VARCHAR,
            ip VARCHAR,
            headers VARCHAR,
            event_time VARCHAR,
            window_timestamp AS TO_TIMESTAMP(event_time, '{pattern}'),
            WATERMARK FOR window_timestamp AS window_timestamp - INTERVAL '15' SECOND
        ) WITH (...)

For 2. it might be a problem with one of the cols, but I cannot see an errors anywhere. Nevertheless, this works

Here are the tables (I ran the job for a few mins only) I get:

and

Spark on Databricks

I casually watched this video on Spark on databricks ~ it was nice as for every part the teacher showed how to use the UI - see jobs descriptions, DAGs, info about jobs, partition sizes, etc.

It covered:

Basics
Main components like driver and worker
Spark UI
Deployment modes
RDDs, DataFrames and DataSets
Transformations and actions
Job, Stages and Tasks
Data Shuffling
Introduction to optimization

Effective Machine Learning Teams - Chapter 1 - Challenges and Better Paths in Delivering ML Solutions

ML promises and disappointments

Businesses are applying diverse ML techniques, such as natural language understanding and computer vision, across sectors like healthcare and finance. Demand for ML skills has grown consistently over the past decade, even during the pandemic. However, addressing past challenges in ML is crucial as the field progresses.

Why ML projects fail

Failure can come in different forms:

inability to ship an ML-enabled product to production
shipping products that customers don’t use
deploying defective products that customers don’t trust
inability to evolve and improve models in production quickly enough

Task	High-effectiveness environment	Low-effectiveness environment
Testing if code changes worked as expected	Automated testing (~ seconds to minutes)	Manual testing (~ minutes to hours)
Testing if ML training pipeline works end to end	Training smoke test (~ 1 minute)	Full model training (~ minutes to hours, depending on the model architecture)
Getting feedback on code changes	Pair programming (~ seconds to minutes)	Pull request reviews (~ hours to days)
Understanding if application is working as expected in production	Monitoring in production (~ seconds - as it happens)	Customer complaints (~ days, or longer if not directly reported)

Is There a Better Way? How Systems Thinking and Lean Can Help

Relying solely on MLOps practices and platforms is insufficient for effective ML delivery. Similar to how DevOps optimizes software infrastructure but cannot address issues like software design or user experience, MLOps cannot compensate for gaps in software engineering (e.g. testing, design) or product delivery practices (e.g user stories, customer journeys). Success in ML requires a multidisciplinary approach involving product development, software engineering, data, ML, and delivery.

See the Whole: A Systems Thinking Lens for Effective ML Delivery

Systems thinking helps us shift our focus from individual parts of a system to relationships and interactions between all the components that constitute a system. Systems thinking gives us mental models and tools for understanding—and eventually changing—structures that are not serving us well, including our mental models and perceptions.

A system consists of 3 things:

elements: such as data, ML experiments, software engineering, infrastructure and deployment, users and customers, and product design and user experience
interconnections: such as cross-functional collaboration and production ML systems creating data for subsequent labeling and retraining
a function or purpose of the ML product: such as helping users find the most suitable products

The components of ML product delivery are inherently interconnected

Systems thinking recognizes that a system’s components are interconnected and that changes in one part of the system can have ripple effects throughout the rest of the system. This means that to truly understand and improve a system, we need to consider the system as a whole and how all its parts work together.

The Five Disciplines Required for Effective ML Delivery

Lean helps ML teams minimize waste, maximize value, and deliver solutions more effectively by incorporating customer feedback and focusing on iterative improvements. Five essential disciplines—product, delivery, software engineering, data, and ML—are critical to effective ML delivery, with each discipline requiring specific principles and practices to enable rapid feedback and iteration.

Lean principles, originating from Toyota’s Production System, provide a structured approach to reduce inefficiencies and enhance outcomes.

The 5 principles of Lean:

The first discipline: Product

Discovery:
- a structured process to clarify problems, explore opportunities, and develop solutions
- relies on tools like the Lean Canvas, Value Proposition Canvas, and Data Product Canvas to align stakeholders and prioritize customer needs
- incorporates customer feedback through techniques like user journey mapping and interviews to evaluate the feasibility and value of ML solutions
- discovery is continuous, helping teams adapt as they build, measure, and learn
Prototype testing:
- rapid, low-cost testing of ideas with users to validate assumptions before investing in engineering
- examples include hand-drawn interfaces, clickable mockups, or “Wizard of Oz” prototypes (manual simulations of automated functions)
- reduces feedback loops from weeks or months to days, enabling fast iteration and early validation of ML product designs

By focusing on customer-centric experimentation and fast feedback, the product discipline helps ML teams avoid building the wrong product and ensures resources are directed toward valuable, impactful solutions.

The second discipline: Delivery

Vertically sliced work:
- avoid horizontal slicing (e.g, separating data pipelines, ML models, and UX), as it delays feedback and risks integration issues
- opt for vertical slices—independently shippable units containing end-to-end functionality, providing earlier feedback and demonstrable value at all stages (story, iteration, release)
Cross-functional teams:
- combine diverse expertise (e.g., data science, engineering, UX) within a team to reduce handoffs, waiting times, and blind spots
- enables smaller batch sizes, faster feedback, and better decision-making
- challenges like inconsistent solutions across teams can be mitigated via shared practices, platform teams, or communities of practice
Ways of working (WoW):
- effective processes and tools, including agile ceremonies (e.g., standups, retros), story workflows, and quality assurance
- focus on the intent behind practices to improve alignment, collaboration, and flow of value, avoiding mechanical adherence to methods
Measuring delivery metrics:
- regularly track metrics like iteration velocity, cycle time, and defect rates to monitor delivery health
- use the ‘four key metrics’ (lead time, deployment frequency, MTTR, and change failure rate) to assess software delivery performance, which correlates with business outcomes
- metrics foster data-driven planning, continuous improvement, and alignment with delivery goals

Delivery ensures timely, reliable execution by prioritizing vertical work slices, fostering cross-functional collaboration, and adopting metrics-driven processes. These practices help ML teams minimize risks, improve flow, and achieve better outcomes for customers and organizations.

The third discipline: Engineering

Automated testing
Refactoring
Code editor effectiveness
Continuous delivery for ML

The fourth discipline: ML

Framing ML problems

ML systems design
Responsible AI and ML governance

The fifth discipline: Data

Closing the data collection loop: effective ML systems must integrate mechanisms to collect and curate predictions in production, enabling the creation of high-quality ground truth for evaluation and retraining. Scaling the labeling process through active learning, self-supervised learning, or weak supervision can address bottlenecks. Where natural labels exist, data pipelines should stream them alongside corresponding features while mitigating risks such as data poisoning and runaway feedback loops, which can entrench bias in subsequent models. Neglecting this process limits opportunities for data-centric model improvements, hampering long-term performance
Data security and privacy: safeguarding data requires a multilayered defense-in-depth approach, including encryption, access controls, and adherence to the principle of least privilege. Clear organizational policies on data governance ensure ethical usage and compliance with regulations, while access is restricted to authorized individuals and systems. These measures protect data integrity, privacy, and the ethical standards necessary for sustainable ML practices.

(wow ~ this 1st chapter is so insightful and definitely something to look back at when thinking about projects)

Effective Machine Learning Teams - Chapter 2 - Product and Delivery Practices for ML Teams

Concepts covered this chapter:

Based on the British Double Diamond Design process.

ML product discovery

Effective ML product discovery involves continuous customer engagement and adopting structured approaches to minimize risks and maximize value. Common pitfalls such as underused ML solutions, over-engineered products, endless PoCs, or weak business cases stem from insufficient discovery efforts.

Discovering product opportunities

During this phase the customers are at the center. Some notable techniques for surfacing the voice of the customer include personas, customer journey mapping, contextual inquiry, and customer interviews. These are common tools in the field of experience design, and each offers unique insights about users and customers.

Canvases to define product opportunities

The value proposition canvas:

Data product canvas

For ML product discovery, it’s important to assess the value and feasibility of candidate solutions. One helpful tool in this regard is the data product canvas, which provides a framework for connecting the dots between data, ML, and value creation.

Hypothesis canvas

The Hypothesis Canvas helps us reduce uncertainty as we formulate testable hypotheses, identify objective metrics, and design lightweight experiments to rapidly validate or invalidate ideas.

Techniques for rapidly designing, delivering, and testing solutions

MVPs (or Minimum Loveable Product, Earliest Testable/Usa⁠ble/Lovable Product)
prototypes (on paper, interactive digital, proof of concepts)

Riskiest assumption test

list all large assumptions and identify the riskiest one
define a test for the riskiest assumption
conduct the test

Product and Design thinking

Design Thinking is a problem-solving methodology widely used in various industries to address complex challenges through a user-centered approach. It involves understanding the needs and experiences of users, redefining problems, and creating innovative solutions. This is an alternative framework to the Double Diamond.Design Thinking is iterative, meaning that it often involves going back and forth between these stages, refining and adjusting as more is learned about the user and the problem. It’s highly collaborative and often involves cross-functional teams to bring different perspectives and expertise to the problem-solving process. This approach is used not only in product design but also in service design, business strategy, and organizational problem-solving.

Product Thinking builds on Design Thinking by including product management practices to focus on the business or organizational viability of any product. This means developing product design through the product lifecycle from initial intensive investment to a sustainable offering characterized by incremental innovation, and eventually to retirement as market fit wanes. It also means thinking about product commercial models and coordinating people and resources within organizations to deliver the product. Product Thinking is also sometimes contrasted with a solution mindset, which might be said to deliver an ad hoc collection of point solutions, rather than a unified offering that coherently solves customer problems with a good user experience.

By the end of the Discovery phase, the team and business should align on a product idea (or ideas) that is desirable (meeting customer needs), viable (profitable), and feasible (achievable within available resources and constraints). The insights and evidence gathered during Discovery can support creating a strong business case and securing funding for the product opportunity, if needed.

At this point, teams transition to the next stage: Inception. This phase focuses on defining actionable plans for product scope, technology, delivery planning, and risk management to ensure the team stays on track during delivery. This will be our next area of focus.

Inception: Setting Teams Up for Success

Inception is the phase where the team and stakeholders align on the product’s vision, scope, objectives, and delivery plan. Its goal is to ensure a shared understanding, providing a foundation to handle changes during delivery. Inception involves activities that help shape, size, and sequence the work needed, typically lasting from a few days to four weeks. It immediately follows Discovery and precedes delivery, ensuring clarity and alignment.

While Discovery focuses on identifying problems and potential solutions, Inception elaborates on the solution, covering technical aspects and delivery plans. The context from Discovery should carry into Inception and Delivery to maintain alignment and ensure buy-in.

In cases where phases are separated for budget or funding reasons, Discovery findings should be documented carefully to support a go/no-go decision. If the decision is ‘go’, Inception and Delivery proceed together. Inception may also focus on achievable delivery scope based on available resources. In rare cases, if the problem is clear and simple, product delivery might start directly at Inception, though skipping Discovery can increase risks.

How to plan and run an inception

Designing an Inception agenda depends on how much existing knowledge and the context the team has around the solution to be built.

A minimal inception agenda includes:

alignment on priorities (e.g vision setting, trade-off sliders)
defining the MVP
cross-functional requirements
solution design and minimum viable architecture
path to production
risks, assumptions, issues, and dependencies (RAIDs)
security threat modeling
ethics and responsible technology
release planning
ways of Working

User stories: building blocks of an MVP

A user story is the smallest unit of value that can be independently shipped to production. A user story is the quantum, or building block, of the functionality of the product. Under agile methodologies, it’s also the quantum or building block of the project scope, so that product and project progress are linked. User stories are the vehicle for translating ideas, intents, and context into clear, business-validated acceptance criteria that guide team members—be they developers, data scientists, ML engineers, quality analysts, etc — to know what to build, and how to know when the feature is done.

User stories should be:

independent: each story should stand alone, independent of other stories. This allows them to be developed, prioritized, and implemented in any order and in parallel
negotiable: rather than being a contract, a story is a conversation starter. Details can be discussed and adjusted based on collaboration between the delivery team and the product owner
valuable: every story should deliver value to the end users or the business. If it doesn’t, it might not be worth including in the product backlog
estimable: the team should be able to estimate the effort required for a story. If a story can’t be estimated because it’s too vague, it’s a hook for the team to clarify the details in the story or decompose the problem further
small: stories should be small enough to be completed within one iteration. If they’re too large, they should be split into smaller, more manageable pieces
testable: there should be clear criteria to determine when a story is done and working as expected. Without this, it’s hard to know when the work is truly complete

Product delivery

Delivery activities

iteration planning
daily standups
story delivery
regular showcases
retrospectives
ongoing risk management
architectural decisions
continuous discovery
dual-track delivery

Diagram of the dual track delivery model:

Measuring product and delivery

Delivery measures:

track metrics like scope completed per iteration, cycle time, and defect rates to monitor delivery health and adherence to timelines
tools like burn-up charts visualize total scope and work completed, enabling early detection of issues like misaligned velocity or scope management
velocity (historical and forecasted) helps in iteration planning and stakeholder communication, but it is team-specific and unsuitable for cross-team comparisons
focus on calibrating delivery plans using insights from metrics, aligning with the Agile principle of responding to change

Product measures:

frameworks like pirate metrics (AARRR) and One Metric That Matters (OMTM) guide teams in tracking user engagement, financial performance, and product-specific outcomes
metrics should evolve with the product’s maturity and focus on shorter feedback loops to refine priorities

Model measures:

ML model performance often uses proxies for product goals, like accuracy or F-score for classifiers
balancing competing needs (e.g. fraud detection vs. user acquisition) is critical, requiring agreed-upon thresholds and continuous improvement
model metrics reflect progress and guide adjustments, blending discovery (experiments) and delivery (implementation)

Discovery measures:

discovery work is inherently uncertain and focuses on maintaining a portfolio of meaningful, time-bound experiments
minimize cycle time for experimentation to accelerate progress and avoid unproductive R&D efforts
regularly review experiment outcomes to integrate discoveries into delivery processes

Conclusion

instead of finding out in two weeks that a developer or ML practitioner’s solution did not align with business expectations, spend a few hours writing and validating some user stories
instead of finding out in three months that your team is going to miss a critical release milestone, spend an hour each week measuring the team’s velocity
instead of finding out in six months that an entire release didn’t improve customer experience, find out critical success factors in two weeks through discovery

That is all for today!

See you tomorrow :)