(Day 265) Day 6 of the DeepLearning.AI Data Engineering Professional Certificate course

Ivan Ivanov · September 22, 2024

data-eng theory applying-knowledge

Hello :) Today is Day 265!

A quick summary of today:

started course 4 - data modelling, transformations and serving

Started Week 1 - Intro to data modelling for analytics

Good Data Models	Poor Data Models
:heavy_check_mark: Reflect the business goals and logic while incorporating business rules	:x: Don’t reflect how the business operates
:heavy_check_mark: Ensure compliance with operational standards and legal requirements	:x: Create more problems than they solve
:heavy_check_mark: Outline the relationships between business processes	:x: Provide stakeholders with inaccurate information and create confusion
:heavy_check_mark: Serves as a powerful communication tool, creating a shared language

Data model types

Conceptual
- describes business entities, relationships, & attributes
- includes business logic and rules
Logical
- details about the implementation of the conceptual model
Physical
- details about the implementation of the logical model in a specific DBMS

Normalisation

Normalization is achieved through several stages called normal forms, each building on the previous one:

First Normal Form (1NF):
- ensures all data in a table is atomic (no nested structures)
- each column must have unique values and each table must have a unique primary key
Example: Splitting a column with nested data (like order items) into individual columns like item SKU, price, and name, while ensuring rows are unique
Second Normal Form (2NF):
- meets 1NF requirements
- eliminates partial dependencies, meaning no column should depend on only part of a composite primary key
Example: Splitting a sales order table into two tables—one for orders and one for order items—removes partial dependencies
Third Normal Form (3NF):
- meets 2NF requirements.
- removes transitive dependencies, where non-key columns depend on other non-key columns
Example: Splitting the orders table into separate customer and item tables removes transitive dependencies (ie customer data depending on order ID)

Each normal form progressively reduces data redundancy and improves data integrity. However depending on the use case, denormalization (combining tables) may sometimes be preferable to avoid complex joins and improve performance.

The lab for this part was about data normalisation only

Dimensional Modeling - Star Schema

**How exactly does the star schema help with analytical queries? **

We can start with a fact table by applying aggregate queries to find, for example, the sum, average, or maximum value of a particular fact measure in the fact table. Then we use the dimension tables to filter or group the facts.

star schema organizes the data in a way that’s easier for business users to understand and navigate
also results in simpler queries with fewer joints which speeds up the query performance

Today I had to travel around a bit so I mainly studied on the bus rides so I covered just part of week 1 from course 4.

Also this is day 265 meaning there are less than 100 days left for this blog (wow).

That is all for today!

See you tomorrow :)