(Day 165) Starting to use mlflow for my research's model tracking + homework 4 of the MLOps zoomcamp

Ivan Ivanov · June 14, 2024

Hello :) Today is Day 165!

A quick summary of today:

  • started using mlflow to track experiments for my research in the lab
  • completed MLOps zoomcamp Module 4 and its homework

Firstly, about experiment tracking with mlflow

I got a feeling that I will run lots of models from now on with different parameters, and hey, I have been learning and using this cool library called mlflow in my studies - this would be a great opportunity to use it in a real scenario.

So I set it up and just started running models. The training is a bit slow because my GPU is not that strong, but I found ways to get free GPU hours. I also want to find a way to host a simple sqlite db to use for the experiment trackings/artifact store.

My initial thoughts on sharing code that I will use for my research - I will definitely do it. I do not like that many papers do not share it for one reason or another, but for now I will keep it closer to my chest while I am still replicating base models.

Secondly, about finishing module 4 and its homework

I think the main new thing I got was about batch vs streaming model deployment. The former involves processing data in large, pre-defined chunks at specific intervals. The model is applied to a batch of data all at once, and the results are produced collectively after processing the entire batch. The latter, involves processing data in real-time as it arrives. The model is applied to each data point (or small batch) as soon as it is available, and the results are produced immediately. There was a nice 1 hour video on streaming deployment using AWS, but I just watched that one for now after yesterday’s fiasco with the incurred bill.

The homework was about converting a jupyter notebook to a script and making it executable by taking argparser params: something like python starter.py –year 2024 –month 3 which runs the taxi duration prediction model on March 2024 NYC taxi data.

Then setting up an environment with pipenv, and creating a docker image and Dockerfile

image

Which then allowed me to run the script through docker-run.

Tomorrow, I am going up to Seoul for a ‘pseudocon’. The poster is in Korean but the even tldr is … AI. Tomorrow I will share about the sessions/tutorials.

That is all for today!

See you tomorrow :)