(Day 326) Finally started the Designing Data-Intensive Applications book

Ivan Ivanov · November 22, 2024

data-eng theory

Hello :) Today is Day 326!

A quick summary of today:

Designing Data-Intensive Applications’s audiobook
fact data modelling
some sql practice

Started listening to Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

I was not aware there were audiobooks on the O’Reilly platform.

Chapter 1: Reliable, Scalable, and Maintainable Applications

Thinking About Data Systems

The boundaries between traditional categories of data systems (databases, queues, caches) are becoming blurred as new tools optimised for diverse use cases have emerged. For instance, some datastores function as message queues (like redis), and some queues provide database-like durability (Kafka). Modern applications often require multiple tools stitched together via application code to meet complex data processing and storage needs, creating composite data systems.

Designing such systems raises challenges around ensuring correctness, consistency, performance, scalability, and a good API. Various factors, including team expertise, legacy dependencies, and organisational risks, influence design decisions.

Key concerns for data systems include:

reliability: ensuring correct functioning under adverse conditions
scalability: managing growth in data, traffic, or complexity
maintainability: enabling productive work on the system over time

Reliability

Reliability refers to a system’s ability to continue functioning correctly even when faults occur. Fault-tolerant or resilient systems anticipate faults—components deviating from specifications—and prevent them from causing failures, which occur when the system fails to deliver expected services. Key fault types include:

Hardware faults: common in large-scale systems, mitigated through redundancy (e.g., RAID, backup power, multi-machine setups). Modern demands often require software fault-tolerance to complement hardware redundancy
Software errors: often systematic, correlated faults such as bugs, resource exhaustion, or cascading failures. Mitigated by rigorous testing, process isolation, self-monitoring, and allowing systems to crash and restart
Human errors: a leading cause of outages. Mitigations include user-friendly design, sandbox environments, automated testing, detailed monitoring, and fast recovery mechanisms

While reliability is essential for user trust and business continuity, trade-offs may occur for prototypes or low-margin services. However, even in non-critical systems, reliability remains a core responsibility to users

Scalability

Scalability is a system’s ability to handle increased load, requiring tailored approaches rather than a one-size-fits-all solution. Load growth can stem from more users, increased data, or changing usage patterns. Key considerations include defining load params (e.g.requests per second or database read/write ratios) and choosing architectures that address specific scalability needs.

Performance metrics like response time and latency are critical. Response time varies and is best described using percentiles (e.g., p50, p95) to understand typical and extreme cases. High percentiles, or tail latencies, significantly affect user experience, especially for multi-step backend processes. Systems must balance optimising for common use cases while addressing critical outliers (e.g. celebrity tweets on Twitter).

To manage increasing load:

scaling up (vertical): moving to a more powerful machine
scaling out (horizontal): distributing load across multiple machines (shared-nothing architecture)

Stateful distributed systems are complex but increasingly necessary for high loads. Elastic systems adjust resources dynamically, while manual scaling offers simplicity. Effective scalability designs rely on understanding specific load patterns, balancing performance, and iterative adaptability.

Maintainability

Maintainability is crucial, as most software costs stem from ongoing maintenance rather than initial development. To avoid creating cumbersome legacy systems, software should prioritize operability, simplicity, and evolvability:

Operability: focuses on making it easy for operations teams to maintain system health, troubleshoot issues, apply updates, and anticipate problems. Systems should enable visibility, automation, good documentation, self-healing capabilities, and predictable behavior
Simplicity: reduces unnecessary complexity, which can hinder understanding, increase maintenance costs, and risk bugs. Simplification often involves removing accidental complexity through effective abstractions that hide implementation details, making systems easier to maintain and reuse
Evolvability: ensures systems can adapt to changing requirements, such as new features or regulatory demands. Evolvable systems leverage simplicity and clear abstractions to remain flexible and support larger-scale modifications, fostering long-term agility

These principles interconnect and aim to minimise maintenance challenges and support the system’s longevity.

Fact data modelling

Today I continued with the content on dataexpert.io - my preparation for the Jan ‘25 bootcamp. I covered lecture 1 only which is related to fact data modelling.

(Again, I cannot and should not share more info as this content is 100% behind a paywall and none of it is available publicly)

Doing some SQL practice

On the dataexpert.io platform there is a /question page which has some practice SQL questions and built-in query editor. Today I finished all the easy ones and some of the medium ones. It’s all free so why not take advantage :)

That is all for today!

See you tomorrow :)