(Day 347) Some ML metrics + Finishing Airflow 101

Ivan Ivanov · December 13, 2024

Hello :) Today is Day 347!

A quick summary of today:

  • checking out The Little Book of ML Metrics
  • stream for ~4.5hrs and finishing the Airflow 101 course

The little book of ML metrics

After watching Vincent’s latest video on the probabl channel about Precision, Recall and the F1-score I reminded myself that this book of ML metrics is actively being written and I decided to finally check some of it out. At the moment about 90 pages are availabe but for maybe half or so of the metrics there are not descriptions/graphs. For those that are written, there are 1-2 pages for each metrics with nice graphics and simple explanations. Here are some of them. I reviewed some more on stream as for each metric the material is very condensed and already very well put 😆

Regression metrics

MAE

MAE is one of the most popular regression accuracy metrics. It is calculated as the sum of absolute errors divided by the sample size. It is a scaledependent accuracy measure which means that it uses the same scale as the data being measured.

When ti use MAE?

Pros:

  • it provides and easy to understand value since it represents the avg error in the same units as the data
  • it treats under-predictions and over-predictions equally (though might not be desirable in all contexts)

Cons:

  • can be biased when the distribution of errors is skewed as it does not account for the direction of the error
  • the absolute value function in its formula is not differentiable at 0 which can pose challenges in optimisation and gradient-based learning algs

image

Did you know that: A forecast method that minimises MAE will lead to forecasts of the median

MSE

Pros:

  • it is differentiable, making it useful in optimisation algs that require gradient information

Cons:

  • due to squaring of each error term, it is particularly sensitive to outliers; large errors contribute more significantly to the MSE than smaller ones
  • can be hard to interpret as it squared the differences

image

Did you know that: MSE can be written as the sum of the variance and bias of an estimator - meaning if we are dealing with an unbiased estimator, MSE will be equivalent to the variance

RMSE

Pros:

  • easy to interpret
  • the squaring of errors ensure that larger errors have a greater impact, if large errors are undesirable

Cons:

  • unlike some other metrics like R^2 and MAPE, RMSE does not provide a normalised measure, making it difficult to compare across different datasets without standardisation

MAPE

MAPE is one of the most widely used measures of forecast accuracy due to its scale independence and interoperability. It is derived from the Mean Absolute Error (MAE), adjusted to provide a percentage measure, making it easier to interpret and compare across different datasets and contexts.

MAPE is useful when the value of the error on it’s own is not as important as its relative magnitude compared to the actual target values. It is not feasible when actual values can be 0. Additionally, MAPE is beneficial when it serves as a meaningful business metric.

Pros:

  • scale-independent and easy to interpret

Cons:

  • indefined when y_true = 0
  • penalises negative errors more than positive ones

image

Mean Absolute Scaled Error (MASE)

MASE is a scale-independent metric that compares the absolute errors of a forecast model to the absolute errors of a naive forecast. This makes MASE particularly useful for comparing the performance of models across different datasets or when the scale of the target variable changes

MASE values less than 1 indicate that the forecast model outperforms the naive forecast, while values greater than 1 indicate the opposite. MASE is unbounded on the positive side, with 0 representing a perfect forecast

Pros:

  • scale-independent, allowing for fair comparisons across datasets
  • provides a clear benchmark for model performance relative to a naive forecast
  • unlike %-based metrics like MAPE, MASE avoids division by 0 issues
  • penalises positive and negative forecast errors equally, and it treats errors in large and small forecasts with equal weight

Cons:

  • required defining a suitable naive forecast method
  • may not be as intuitive to interpret as some other metrics like MAE/MAPE

Mean Directional Accuracy (MDA)

MDA is a metric used to evaluate the performance of forecasting models in terms of their ability to correctly predict the direction of change in the target variable, rather than the magnitude of the change.

It ranges from 0 to 1, 0 meaning the model always predicted the opposite direction and 1 indication the model correctl predicted the direction of change for all data points.

Pros:

  • provides a simple, interpretable metric that is not affected by the scale of the target variable
  • does not provide any info about the magnitude of errors, only the direction

Pinball loss

Pinball Loss is a metric specifically designed for evaluating quantile regression models. It measures the accuracy of predictions at a specified quantile, such as the median (q = 0.5), and generalizes the concept of absolute loss to asymmetric penalties. This makes it suitable for scenarios where the goal is to predict conditional quantiles of a target variable rather than the mean.

image

It ranges from 0 to +infinity, with 0 indicating a perfect quantile prediction. It asymmetrically penalizes over and under-predictions based on the specified quantile, making it unique among regression metrics.

Pros:

  • can target any quantile by adjusting q enabling detailed analysis across the entire distribution
  • asymmetric loss penalties reflect practical priorities, such as minimising over-prediction costs

Cons:

  • for reliable results, the model must be well-calibrated for the chosen quantile

image

Did you know that: Pinball Loss is also called Quantile Loss and is an L1 Loss when q = 0.5.

There are also some classification metrics, and I encourage you to check out the video probabl (link above) and also the book itself (also link above).

Stream

I streamed twice today

Amazing courses - with theory and practice. I am keen on learning more and later I am planning on using airflow as my orchestrator when the time to build a project for Zach Wilson’s Jan boot camp comes ~

I saw there are also other airflow and astronomer courses which I might do on stream in the coming week ^^


That is all for today!

See you tomorrow :)