Hello :) Today is Day 179!
A quick summary of today:
- continued working on my Lending club data engineering project
Today I added a few cool features, and I learned plenty
- Introduced docker to the project
Here is the Dockerfile I created
Today I learned more about the docker folder structure and where to copy what, and where things live. At first I was not sure where things go, in which directory should I point my env vars, and where should I copy files. But then I also found that in Docker desktop I can view the files in a running image, so that is how I figured out what and where.
The bash script referenced is here:
And my docker-compose.yml (before I added the volumes, the code I was writing in mage was not persisting, so now I know what happens without volumes)
- Added a Makefile
I also made a Makefile (using this for the first time). I saw that adding a Makefile is good in the data eng zoomcamp project advices. And is good for reproducability.
These are the options that can be executed
(this make ‘interface’ is looks so nice)
- Began thinking and designing my data dimension modelling strategy
I created the below using lucidchart
I do not have natural unique identifiers in my data, so I had to go with surrogate keys. Initially I had a loan dimension table as well, but I felt that I could not create a surrogate key using unique enough columns.
At the moment the data lineage looks like the above. I am adding data documentation:
I have descriptions for other columns too.
I will look to add some tests as well.
That is all for today!
See you tomorrow :)