(Day 153) First steps into orchestration and ML pipelines (module 3 from MLOps zoomcamp)

Ivan Ivanov · June 2, 2024

applying-knowledge mlops

Hello :) Today is Day 153!

A quick summary of today:

started Module 3: Orchestration and ML pipelines from the MLOps zoomcamp

It is my 26th birthday today, so my study time was pretty limited and studied during my 2 hour bus rider home from Seoul.

The below is the full outline

I only got to complete 3.1. Data preparation, and below are some pics/notes I took. So before this, in the MLOps zoomcamp we covered mlflow, experiment tracking and model management. I guess orchestration is the next step which automates the whole data prep, training, testing process (note: I am not sure what else so far).

The 2024 cohort of the camp, uses mage.ai as a free-to-use platform, so today’s and the next steps will be done on its platform.

First it was setup using docker.

git clone https://github.com/mage-ai/mlops.git

cd mlops

./scripts/start.sh

And a local mage.ai webapp was started

In the 3.1 part we created a data preparation pipeline, and here is the final version:

It is not hard to use, and fairly easy to set up. Each block is a piece of code. For example the ingest block is:

It downloads taxi data. Then when we create a next block, following it, it seems to automatically chain outputs from previous to inputs of current block. Below is the second ‘prepare’ block:

And the final ‘build’ block is where we have the dataset split and data vectorized (using util functions)

I will try to complete the rest of the module in the coming week ^^

That is all for today!

See you tomorrow :)