(Day 320) Day 1 of Zach Wilson's DE bootcamp - Data Modelling

Ivan Ivanov · November 16, 2024

Hello :) Today is Day 320!

A quick summary of today:

  • watched the lecture + covered the lab

The videos are available on youtube as well, but I am going to do it on his dataexpert platform to get credits ^^ And also I can submit assignments on there as well

image

I decided to take notes while watching the lecture/lab:

Dimensional data modelling day 1 lecture P1 Dimensional data modelling day 1 lecture P2 Dimensional data modelling day 1 lecture P3 Dimensional data modelling day 1 lecture P4 Dimensional data modelling day 1 lecture P5

And here is my code from the lab: (I am using DataGrip)

select * from player_seasons;

-- create a struct
create type season_stats as (
    season integer,
    gp integer,
    pts real,
    reb real,
    ast real
);

create type scoring_class as enum ('star', 'good', 'average', 'bad');

create table players (
    player_name text,
    height text,
    college text,
    country text,
    draft_year text,
    draft_round text,
    draft_number text,
    season_stats season_stats[],
    scoring_class scoring_class,
    years_since_last_season integer,
    current_season integer,
    primary key(player_name, current_season)
);

select min(season) from player_seasons;
-- 1996 - this is the 1st year in the data
-- used for the cumulative table

-- this is like a 'pipeline' where we can change the seasons
-- as if it's yesterday and today, and insert new data into players
insert into players
with yesterday as (
    select * from players
             where current_season = 1994
),
    today as (
        select * from player_seasons
                 where season = 1995
    )

select
    coalesce(t.player_name, y.player_name) as player_name,
    coalesce(t.height, y.height) as height,
    coalesce(t.college, y.college) as college,
    coalesce(t.country, y.country) as country,
    coalesce(t.draft_year, y.draft_year) as draft_year,
    coalesce(t.draft_round, y.draft_round) as draft_round,
    coalesce(t.draft_number, y.draft_number) as draft_number,
    case
        when y.season_stats is null then array[row(
            t.season,
            t.gp,
            t.pts,
            t.reb,
            t.ast
            )::season_stats] --:: casting it
        when t.season is not null then y.season_stats || array[row(
            t.season,
            t.gp,
            t.pts,
            t.reb,
            t.ast
            )::season_stats] -- || concat
        else y.season_stats
        -- if a player has retired just keep his stats
        -- don't add empty values
    end as season_stats,
    case
        when t.season is not null then
            case when t.pts > 20 then 'star'
                when t.pts > 15 then 'good'
                when t.pts > 10 then 'average'
                else 'bad'
            end::scoring_class
        else y.scoring_class
    end as scoring_class,
    case
        when t.season is not null then 0
        else y.years_since_last_season + 1
    end as years_since_last_season,
    coalesce(t.season, y.current_season+1) as current_season
-- using the coalesce line is much nicer
--     case
--         when t.season is not null then t.season
--         else y.current_season + 1
--     end
from today t full outer join yesterday y
    on t.player_name = y.player_name;

-- turn the 1 line about MJ to 3 lines because
-- at that point in our data he has stats for 3 seasons
-- and season_stats was filled with 3 items
select player_name,
       unnest(season_stats) as season_stats
       from players
where current_season = 2001
and player_name = 'Michael Jordan';

-- show how to explode so we get the columns as well
-- going back to the old schema
-- unnest keeps the temporal pieces together
with unnested as (
    select player_name,
         unnest(season_stats) as season_stats
    from players
    where current_season = 2001
-- it is all sorted
--     and player_name = 'Michael Jordan'
)
select player_name,
       (season_stats::season_stats).*
from unnested;

select * from players
where current_season = 2001;

-- check which player has the biggest improvement from their debut to their most recent season
-- there is no group by yet it feels like we grouped by so it is fast
-- the order by is the slowest part
select

    player_name,
    (season_stats[cardinality(season_stats)]::season_stats).pts/
    case when (season_stats[1]::season_stats).pts = 0 then 1 else (season_stats[1]::season_stats).pts end
from players
where current_season = 2001
order by 2 desc;

After watching the lecture+lab once, I decided to rewatch it just because the 1st time I kept pausing to take notes/write code meself.

Definitely amazing, learned a lot, and am excited for the next lecture + lab coming out Monday Korean time.


Today I wanted to go to the lab but all building doors were locked 😆 so I could not even enter the building. Usually there is 1 open door but not today :/ I will try again tomorrow.

That is all for today!

See you tomorrow :)