Hello :) Today is Day 214!
A quick summary of today:
- documenting the KB project data using arrow.app
- small repo changes
- ML zoomcamp announcement
I tried to find some graph documentation tool for the KB project. The most popular I could find is this arrows.app application where we can create sample nodes with edges between and add attributes with descriptions. I created the below pic using that app
Also I made some small changes to the repo:
the readme I added:
- Docker - used to containarise and self-host the below services
- Mlflow - used for easy model comparison during development. Can be improved by using cloud services (like GCP) for hosting, database and artifact store
- Mage - used for pipeline orchestration of the model training and real-time inference pipelines. Can be improved by hosting mage on the cloud
- Neo4j - used as a Graph database to store transaction data as nodes and edges. Can be improved by using Neo4j’s AuraDB (hosted on the cloud)
- Kafka - used to ensure real-time transaction data processing
- Grafana - used for real-time dashboard creating and monitoring. Can be improved by using Grafana’s cloud services
- Streamlit - used to host a model dictionary showing models, evaluation metrics, and feature importance graphs
Terraform can be used to provide Infrastructure as Code (IaC) for services that can be hosted on the cloud
I also removed .env from gitignore just so that reproducers can clearly see the env variables used. At the moment there is no actual secret variables so it is fine.
A task I added to my TODOs is to figure out how to setup the neo4j db connection on grafana startup. I saw I could use a grafana.ini file, but I will figure out this tomorrow.
On another note, today the evaluation of my MLOps zoomcamp project came back.
I got 32, which I believe is the max. The points are given for: describing the problem, usage of cloud services, mlflow and exp. tracking, orchestration, model deployment, monitoring, reproducability and best practices.
The certificate is not issued yet, but I will share it when it is sent.
Today I saw that Alexey Grigorev, the creator of the DataTalksClub, announced the next cohort for their ML zoomcamp. It will cover the below:
I might sign up to see if I can learn something new, and keep my skills sharp ^^
That is all for today!
See you tomorrow :)