(Day 157) GNN design choices and starting an MLOps book on manning.com

Ivan Ivanov · June 6, 2024

reading-research mlops

Hello :) Today is Day 157!

A quick summary of today:

read about design choices for GNNs
started reading a book about MLOps on manning.com
registered for Korea summer workshop on causal inference

Design choices for Graph Neural Networks [arxiv]

I saw this paper from the last short lecture from XCS224W: ML with graphs, it looks into how different design choices affect a GNN model’s performance. It caugt my eye as some of the findings as to what is useful could be tested/applied in my work/research at the lab.

Below are some of the interesting findings:

For example, having batch norm is always better more than not having it; no dropout seems to perform better too, prelu activation seems the best, and using sum as an aggregation function outperforms mean and max.

Design a Machine Learning System (From Scratch) [book]

The book will teach me how to:

Build an ML Platform
Build and Deploy ML Pipelines
Extend the ML Platform using various tools depending on use cases
Implement different kinds of ML services using the ML life cycle as a mental model
Deploy ML services that are reliable and scalable

A quick summary of chapter 1: Getting Started with MLOps and ML Engineering

An ML project begins with the experimental phase. This phase involves continuous iterations and adjustments across various steps, such as model training and evaluation, to refine models based on complex data and performance metrics. Building an orchestrated pipeline automates these steps, reducing errors and streamlining the process before full automation in later stages.

Next, is the Dev/Staging/Production phase marks the shift from model experimentation to deployment in real-world settings, emphasizing scalability, robustness, and real-time performance. This phase involves full automation of the pipeline, with continuous integration and performance monitoring to ensure smooth deployment and maintenance of ML models.

The skills needed for MLOps include software engineering, data science, DevOps, data management, model training, deployment, monitoring, debugging, and automation to ensure reproducibility and compliance. An ML platform enables practitioners to develop and deploy ML services by integrating various tools essential for the ML life cycle. Typically, it comprises a collection of loosely related software that evolves as team maturity and use case complexity increase. This book guides you through building an ML platform using Kubeflow, covering components like pipeline orchestration, feature stores, and model registries.

A Short Summary of Chapter 2: What is MLOps?

First step - Data Collection

Data relevance to problem domain
Size of the dataset with respect to problem complexity
Quality of the dataset. Prevention of harmful biases and unintentional leakage of samples
Distribution of data and features is representative of the deployment environment
Sufficient diversity in the data collection process that defines the problem domain well
Lineage and detailed tracking of raw data, intermediate versions, and annotated datasets

Second step - EDA (Exploratory Data Analysis)

What does my data look like? What is its schema? Can the schema change? How would I guard against invalid values? Does the data require cleaning?
How is my data distributed? Are all targets equally represented, or should there be additional class balancing?
Does my data have robust features for the task I have in mind? Are the features expensive to compute?
Does the input data vary cyclically? Does it exhibit correlations to external factors that are not modeled?
Are there any outliers in the dataset? What must be done to outlier values in production?

Third step - Modelling and Training

Model and data versioning
Experiment tracking
Model training pipelines
Hyperparameter search

Fourth step - Model Evaluation

Choosing the appropriate metrics
Examine and analyse misclassified samples, error patterns

Fifth step - Deployment

To a specific environment, or API endpoint
Staging vs production

Sixth step - Monitoring

Track anomalies in data and model performance
Data/performance/error monitoring
Need reliable notification/alert system

Seventh step - Maintenance, Updates, and Review

Implement bug fixes to address issues and shortcomings in the model in production
Collect data to train the model on specific edge cases and improve performance
Mitigate data drift by informing the data collection component and retrain a new model

Importance of Robust MLOps

It is crucial because it ensures the seamless integration and collaboration of various specialized disciplines, leading to consistent and reliable model performance that drives business value. It addresses complex challenges such as data management, scalability, and compliance, reducing technical debt and enhancing the maintainability and scalability of ML projects. A robust framework supports continuous improvement and mitigates risks associated with data privacy, biases, and system failures.

Korea Summer Workshop on Causal Inference 2024

Here is the post (in Korean):

It spans over 4 days - June 13, 20, 27, 28, and some of the covered topics are:

Causal inference in policy making and various practical applications
Best practices for online controlled experiments and experiments platform
Data-driven decision-making
Data science & causal ML/AI in the generative AI era

On the day of the events, I will post more detailed info about each day’s sessions because there are a total of 16 for now.

That is all for today!

See you tomorrow :)