(Day 214) The evaluation of my MLOps zoomcamp project arrived - max points

Ivan Ivanov · August 2, 2024

Hello :) Today is Day 214!

A quick summary of today:

  • documenting the KB project data using arrow.app
  • small repo changes
  • ML zoomcamp announcement

I tried to find some graph documentation tool for the KB project. The most popular I could find is this arrows.app application where we can create sample nodes with edges between and add attributes with descriptions. I created the below pic using that app

image

Also I made some small changes to the repo:

image

the readme I added:

  • Docker - used to containarise and self-host the below services
  • Mlflow - used for easy model comparison during development. Can be improved by using cloud services (like GCP) for hosting, database and artifact store
  • Mage - used for pipeline orchestration of the model training and real-time inference pipelines. Can be improved by hosting mage on the cloud
  • Neo4j - used as a Graph database to store transaction data as nodes and edges. Can be improved by using Neo4j’s AuraDB (hosted on the cloud)
  • Kafka - used to ensure real-time transaction data processing
  • Grafana - used for real-time dashboard creating and monitoring. Can be improved by using Grafana’s cloud services
  • Streamlit - used to host a model dictionary showing models, evaluation metrics, and feature importance graphs

Terraform can be used to provide Infrastructure as Code (IaC) for services that can be hosted on the cloud

I also removed .env from gitignore just so that reproducers can clearly see the env variables used. At the moment there is no actual secret variables so it is fine.

A task I added to my TODOs is to figure out how to setup the neo4j db connection on grafana startup. I saw I could use a grafana.ini file, but I will figure out this tomorrow.

On another note, today the evaluation of my MLOps zoomcamp project came back.

I got 32, which I believe is the max. The points are given for: describing the problem, usage of cloud services, mlflow and exp. tracking, orchestration, model deployment, monitoring, reproducability and best practices.

The certificate is not issued yet, but I will share it when it is sent.

Today I saw that Alexey Grigorev, the creator of the DataTalksClub, announced the next cohort for their ML zoomcamp. It will cover the below:

image image

I might sign up to see if I can learn something new, and keep my skills sharp ^^

That is all for today!

See you tomorrow :)