(Day 253) Designing effective ML monitoring with EvidentlyAI Part 2

Ivan Ivanov · September 10, 2024

Hello :) Today is Day 253!

A quick summary of today:

  • finished module 4 of evidenyliAI’s course

4.4. How to choose a reference dataset in ML monitoring

Why use a reference dataset?

There are two main uses for a reference dataset.

  1. You can use it to derive test conditions automatically, saving time and effort in setting up tests manually.

For example, you can use a reference dataset to generate conditions for data quality checks (to track feature ranges, share of missing values, etc.) and model quality checks (to keep tabs on metrics like precision and accuracy) by passing a previous batch of data as a reference.

  1. You can use a reference dataset as a baseline to detect data and prediction drift in production by comparing new data distributions against reference distributions.

Reference dataset also can be used to detect training-serving skew as it provides a baseline to detect changes between training and production data.

image

What makes a good reference dataset?

Characteristics of a good reference dataset:

  • Reflects realistic data patterns, including cycles and seasonality.

  • Contains a large enough sample to derive meaningful statistics.

  • Includes realistic scenarios (e.g., sensor outages) to validate against new data.

What a reference dataset is not:

  • It is not the same as a training dataset. You can sometimes choose training data to be your reference, but they are not synonymous.

  • It is not a “golden dataset,” which serves a different purpose.

Using training data as a reference

image

image

image

image

image

Summing up

  • There is no “universal” reference dataset. It should be tailored to the specific use case and expectations of similarity to current data.

  • Hold-out validation data is preferred over training data for creating reference datasets, especially for drift detection. Use training data only if there is nothing else.

  • It is crucial to account for seasonality and historical patterns when choosing the reference dataset to ensure that it accurately represents the expected variations in data.

  • Historical data is a valuable resource for informing reference dataset selection.

4.5. Custom metrics in ML monitoring

Types of custom metrics

While there is no strict division between “standard” and “custom” metrics, there is some consensus on evaluating, for example, classification model quality using metrics like precision and recall. They are fairly “standard.”

However, you often need to implement “custom” metrics to reflect specific aspects of model performance. They typically refer to business objectives or domain requirements and help capture the impact of an ML model within its operational context.

Here are some examples.

Business and product KPIs (or proxies). These metrics are aligned with key performance indicators that reflect the business goals and product performance.

Examples include:

  • Manufacturing optimization: raw materials saved.

  • Chatbots: number of successful chat completions.

  • Fraud detection: number of detected fraud cases over $50,000.

  • Recommender systems: share of recommendation blocks without clicks.

image

image

image

4.7. How to choose the ML monitoring deployment architecture

image

image

image

image

image

image

image

image


I had my exam today, but after it I am glad I found time to complete this module.

That is all for today!

See you tomorrow :)