(Day 146)] MLOps zoomcamp module 2 homework + some more prep for Microsoft x NVIDIA's hackaton

Ivan Ivanov · May 26, 2024

Hello :) Today is Day 146!

A quick summary of today:

  • did 2nd homework from MLOps zoomcamp on experiment tracking
  • covered some material on devops on Azure

MLOps zoomcamp Module 2 homework on experiment tracking

The homework link is here. The homework followed the below outline:

  1. Install mlflow
  2. Preprocess NYC taxi data
  3. Train a model with autolog
  4. Launch a tracking server locally
  5. Tune model hyperparams
  6. Promote the best model

My code for it is on my github.

The used model was a random forest regressor using rmse as the main evaluation metric. The homework was more focused on running mlflow commands rather than the preprocessing/training (which were done in Module 1 of the course).

Below is a graphical comparison of some of the models I ran

image

I had to set up logging and mlflow db for a sample run

image

Run hyperparam optimization

image

And finally choosing and registering the best model

image

For a bit I faced an issue with the params passed to the model because they were missing (initially not logged in the hyperparam opt code), and then I kept registering the worst performing model because i had [0] at the end of that best run, but when I put [-1] it was all good. Below are the top 5 and then best among the 5 models is registerered.

image

As for the next module - module 3, I saw we will use Mage to learn Orchestration and ML Pipelines and saw that the youtube videos were uploaded today.

2nd prep course for Microsoft x NVIDIA hackaton in Microsoft’s office in Seoul

It was a bit long so I covered just the bits I thought were new, and below are some notes I took along the way.

Azure Boards: tools for agile planning, work item tracking, visualization, and reporting

Azure Pipelines: a CI/CD platform that is language, platform, and cloud-agnostic, supporting containers and Kubernetes

Azure Repos: cloud-hosted private Git repositories with two types of version control: distributed (Git) and centralized (Team Foundation Version Control, TFVC). Features include

  • Free private Git repositories
  • Support for any Git client
  • Web hooks and API integration
  • Semantic code search
  • Git code reviews with threaded discussion
  • Continuous integration for each change
  • Branch policies to ensure code quality
  • Integration with favorite tools and editors

Azure Artifacts: Integrated package management supporting Maven, npm, Python, and NuGet packages from public or private sources

Azure Test Plans: An integrated solution for planned and exploratory testing

Workflows: Define automation processes, detailing trigger events and jobs to run. Workflows are written in YAML and located within the GitHub repository at .github/workflows. Example:

image

GitHub runners: compute resources that execute GitHub Actions workflows, performing build, test, and deployment tasks directly within GitHub repositories

Continuous integration with actions example

image

  • On: Specifies what will occur when code is pushed
  • Jobs: There’s a single job called build
  • Strategy: It’s being used to specify the Node.js version
  • Steps: Are doing a checkout of the code and setting up dotnet
  • Run: Is building the code

Next, Azure Pipelines is a robust service for creating cross-platform CI/CD workflows, integrating with your preferred Git provider and deploying to major cloud services. Key features

  • End-to-end flow of value: Focus on delivering continuous value from concept to customer, avoiding siloed processes
  • Automation and reliability: Establish a repeatable, reliable process for software delivery
  • Quality verification: Each pipeline stage verifies feature quality to prevent user-facing errors
  • Team feedback and visibility: Ensure all team members have feedback and visibility into the delivery process
  • Frequent small changes: Promote frequent, smaller changes for better flow and optimization
  • Continuous improvement: Continuously monitor and resolve obstacles to improve pipeline efficiency

Pipeline stages:

  • Build automation and Continuous Integration: Automatic building and integrating of changes
  • Test automation: Automated testing for functionality and quality
  • Deployment automation: Automated deployment to various environments

Orchestration:

  • Release and pipeline orchestration: Manage and control the entire pipeline, providing top-level insights
  • Value stream mapping: Identify and improve inefficiencies
  • Infrastructure efficiency: Efficient infrastructure is crucial for an effective pipeline

image

Key terms in Azure pipelines

image

  • Agent: software that runs a build or deployment job
  • Artifact: collection of files or packages published by a build, used for distribution or deployment
  • Build: one execution of a pipeline, collecting logs and test results
  • Continuous Delivery (CD): process of building, testing, and deploying code to various stages, ensuring quality through multiple stages and constant monitoring
  • Continuous Integration (CI): practice of automating the testing and building of code to catch issues early. CI systems produce artifacts used in CD pipelines for automatic deployments
  • Deployment target: destination for application deployment, such as a virtual machine, container, or web app
  • Job: a set of steps run on an agent within a build, representing an execution boundary
  • Pipeline: script defining the CI/CD process, composed of tasks for testing, building, and deploying
  • Release: execution of a release pipeline, including deployments to multiple stage
  • Stage: primary divisions in a pipeline, such as ‘build,’ ‘test’, and ‘deploy’
  • Task: building block of a pipeline, performing specific jobs like building, testing, or deploying
  • Trigger: configuration that specifies when the pipeline should run, such as on code push or schedule

That is all for today!

See you tomorrow :)