Hello :) Today is Day 252!
A quick summary of today:
- covered first 3 sections of module 4
Tomorrow I have an exam so today my study time for this blog was a bit limited. Nevertheless, I got to cover half of Module 4 from EvidentlyAI’s course on AI monitoring.
4.1. Logging for ML monitoring
What is a good ML monitoring system? A good ML monitoring system consists of three key components:
-
Instrumentation to ensure collection and computation of useful metrics for analyzing model behavior and resolving issues.
-
Alerting to define unexpected model behavior through metrics and thresholds and design action policy.
-
Debugging to provide engineers with context to understand model issues for faster resolution.
Logging and instrumentation
Step 1
Step 2
Step 3
4.2. How to prioritize ML monitoring metrics
TLDR for metric prio
- Service health
- Model performance
- Data quality and data integrity
- Data and concept drift
Comprehensive monitoring
Depending on the problem statement and model usage scenario, we can introduce more comprehensive monitoring metrics:
-
Performance by segment. It can be especially useful if we deal with a diverse audience or complex object structures and want to monitor them separately.
-
Model bias and fairness. These metrics are crucial for sensitive domain areas like healthcare.
-
Outliers. Monitoring for outliers is vital when individual errors are costly.
-
Explainability. Explainability is important when users need to understand model decisions/outputs.
4.3. When to retrain machine learning models
Model retraining strategies
- On Schedule
Pro-tip for scheduled retraining: use historical data to determine the rate of model decay and the volume of new data required for effective retraining. For example, we can get a training set from our historical data and train a model on top of this dataset. Then, we can start experimenting: apply this model to new batches of data with a certain time step – daily, weekly, monthly – to measure how the model performs on the new data and define when its quality starts to degrade. Important note: we need labels to do it. If feedback/ground truth is not available yet, it makes sense to send data for labeling before we start experimenting with historical data.
- Trigger-based retraining
Model retraining tradeoffs
Thinking through the retraining decision
Be pragmatic. Develop a strategy considering available actions, service properties, resources, model criticality, and the cost of errors. Here is an example of a decision-making logic you can follow:
That is all for today!
See you tomorrow :)