Hello :) Today is Day 183!
Today is actually past the half-way point in this blog journey. 182 days to go!
A quick summary of today:
- setting up mlflow for experiment tracking in GCP
Today I started looking at tools I can use for my MLOps zoomcamp project. I used Mage for the data engineering project so now for an orchestrator I want to try another tool.
My first idea (which I had on my mind for a while) is Kubeflow. Tldr - it is so hard to set up locally and I could not find a guide that clearly explains how :(
I had this 1hr video in my ‘to watch’ for a bit and it goes over building an ML pipeline in kubeflow however it does not go over how to install from kubeflow from 0 to what is needed in the video. Then I started looking in kubeflow’s documentation - I tried to set it up on GCP but I could not and actually was a bit adamant to use it because it explicitly said that the free credits on GCP do not cover kubeflow hosting costs. So ~ I started looking into local deployment. There were 3 different posts (1 of which from IBM) that were from 2023 and all set it up differently from nothing to kubeflow, but … none of them work :( I was losing hope as I was investing so much time and I could not, not even use it, just install and set up kubeflow…
At some points I got to the part where I have a kubeflow cluster running and can go to localhost:8080 but I could not run a pipeline due to some error.
I decided to step back a bit, and set up something more basic - mlflow. I did it locally, but for this project I wanted to use GCP.
I created a Compute Engine
Created a storage bucket for mlflow artifacts
Using SSH I went into the VM and installed mlflow and set up a sqlite db. I decided to use sqlite because:
- I will not have that many experiments to keep
- creating an SQL db on GCP seems expensive to keep active
Then I tried running a simple experiment locally, but I was getting a weird error from mlflow: Anonymous credentials cannot be refreshed
it seemed related to my GCP credentials so I thought that I needed to add my creds.json file to the VM so that it can access them - did that but it was not the issue.
Turns out, in the jupyter notebook I needed to explicitly point to the creds.json file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'my-creds.json'
and with that my issue was fixed and now mlflow hosted on GCP worked.
Though I failed to do anything with kubeflow today, I will not give up. I used Mage for orchestration in the data eng project, so now I will try to use another tool like Prefect (if not Kuberflow). But still, nothing is certain, so I will not be surprised if I write in tomorrow’s post that I am using Mage for this project too. Mage is pretty nice ^^
I uploaded the parts of my project to this GitHub repo.
That is all for today!
See you tomorrow :)