(Day 265) Day 6 of the DeepLearning.AI Data Engineering Professional Certificate course

Ivan Ivanov · September 22, 2024

Hello :) Today is Day 265!

A quick summary of today:

Started Week 1 - Intro to data modelling for analytics

image

Good Data Models Poor Data Models
:heavy_check_mark: Reflect the business goals and logic while incorporating business rules :x: Don’t reflect how the business operates
:heavy_check_mark: Ensure compliance with operational standards and legal requirements :x: Create more problems than they solve
:heavy_check_mark: Outline the relationships between business processes :x: Provide stakeholders with inaccurate information and create confusion
:heavy_check_mark: Serves as a powerful communication tool, creating a shared language  

image

Data model types

  1. Conceptual
    • describes business entities, relationships, & attributes
    • includes business logic and rules
  2. Logical
    • details about the implementation of the conceptual model
  3. Physical
    • details about the implementation of the logical model in a specific DBMS

Normalisation

image

Normalization is achieved through several stages called normal forms, each building on the previous one:

  1. First Normal Form (1NF):
    • ensures all data in a table is atomic (no nested structures)
    • each column must have unique values and each table must have a unique primary key

    Example: Splitting a column with nested data (like order items) into individual columns like item SKU, price, and name, while ensuring rows are unique

  2. Second Normal Form (2NF):
    • meets 1NF requirements
    • eliminates partial dependencies, meaning no column should depend on only part of a composite primary key

    Example: Splitting a sales order table into two tables—one for orders and one for order items—removes partial dependencies

  3. Third Normal Form (3NF):
    • meets 2NF requirements.
    • removes transitive dependencies, where non-key columns depend on other non-key columns

    Example: Splitting the orders table into separate customer and item tables removes transitive dependencies (ie customer data depending on order ID)

Each normal form progressively reduces data redundancy and improves data integrity. However depending on the use case, denormalization (combining tables) may sometimes be preferable to avoid complex joins and improve performance.

The lab for this part was about data normalisation only

Dimensional Modeling - Star Schema

image

image

**How exactly does the star schema help with analytical queries? **

We can start with a fact table by applying aggregate queries to find, for example, the sum, average, or maximum value of a particular fact measure in the fact table. Then we use the dimension tables to filter or group the facts.

image

  • star schema organizes the data in a way that’s easier for business users to understand and navigate
  • also results in simpler queries with fewer joints which speeds up the query performance

Today I had to travel around a bit so I mainly studied on the bus rides so I covered just part of week 1 from course 4.

Also this is day 265 meaning there are less than 100 days left for this blog (wow).

That is all for today!

See you tomorrow :)