(Day 179) Using Docker, Makefile, and starting Data modelling for my Lending club project

Ivan Ivanov · June 28, 2024

Hello :) Today is Day 179!

A quick summary of today:

  • continued working on my Lending club data engineering project

Today I added a few cool features, and I learned plenty

  1. Introduced docker to the project

Here is the Dockerfile I created

Today I learned more about the docker folder structure and where to copy what, and where things live. At first I was not sure where things go, in which directory should I point my env vars, and where should I copy files. But then I also found that in Docker desktop I can view the files in a running image, so that is how I figured out what and where.

The bash script referenced is here:

And my docker-compose.yml (before I added the volumes, the code I was writing in mage was not persisting, so now I know what happens without volumes)

  1. Added a Makefile

I also made a Makefile (using this for the first time). I saw that adding a Makefile is good in the data eng zoomcamp project advices. And is good for reproducability.

These are the options that can be executed

(this make ‘interface’ is looks so nice)

  1. Began thinking and designing my data dimension modelling strategy

I created the below using lucidchart

I do not have natural unique identifiers in my data, so I had to go with surrogate keys. Initially I had a loan dimension table as well, but I felt that I could not create a surrogate key using unique enough columns.

At the moment the data lineage looks like the above. I am adding data documentation:

I have descriptions for other columns too.

I will look to add some tests as well.

That is all for today!

See you tomorrow :)