(Day 183) Failing to install Kubeflow, and setting up mlflow on GCP

Ivan Ivanov · July 2, 2024

Hello :) Today is Day 183!

Today is actually past the half-way point in this blog journey. 182 days to go!

A quick summary of today:

  • setting up mlflow for experiment tracking in GCP

Today I started looking at tools I can use for my MLOps zoomcamp project. I used Mage for the data engineering project so now for an orchestrator I want to try another tool.

My first idea (which I had on my mind for a while) is Kubeflow. Tldr - it is so hard to set up locally and I could not find a guide that clearly explains how :(

image

I had this 1hr video in my ‘to watch’ for a bit and it goes over building an ML pipeline in kubeflow however it does not go over how to install from kubeflow from 0 to what is needed in the video. Then I started looking in kubeflow’s documentation - I tried to set it up on GCP but I could not and actually was a bit adamant to use it because it explicitly said that the free credits on GCP do not cover kubeflow hosting costs. So ~ I started looking into local deployment. There were 3 different posts (1 of which from IBM) that were from 2023 and all set it up differently from nothing to kubeflow, but … none of them work :( I was losing hope as I was investing so much time and I could not, not even use it, just install and set up kubeflow…

At some points I got to the part where I have a kubeflow cluster running and can go to localhost:8080 but I could not run a pipeline due to some error.

I decided to step back a bit, and set up something more basic - mlflow. I did it locally, but for this project I wanted to use GCP.

I created a Compute Engine

image

Created a storage bucket for mlflow artifacts

image

Using SSH I went into the VM and installed mlflow and set up a sqlite db. I decided to use sqlite because:

  1. I will not have that many experiments to keep
  2. creating an SQL db on GCP seems expensive to keep active

Then I tried running a simple experiment locally, but I was getting a weird error from mlflow: Anonymous credentials cannot be refreshed

it seemed related to my GCP credentials so I thought that I needed to add my creds.json file to the VM so that it can access them - did that but it was not the issue.

Turns out, in the jupyter notebook I needed to explicitly point to the creds.json file os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'my-creds.json' and with that my issue was fixed and now mlflow hosted on GCP worked.

Though I failed to do anything with kubeflow today, I will not give up. I used Mage for orchestration in the data eng project, so now I will try to use another tool like Prefect (if not Kuberflow). But still, nothing is certain, so I will not be surprised if I write in tomorrow’s post that I am using Mage for this project too. Mage is pretty nice ^^

I uploaded the parts of my project to this GitHub repo.

That is all for today!

See you tomorrow :)