(Day 6) Time series course - Kaggle

Ivan Ivanov · January 7, 2024

Hello :) Today is Day 6!

A quick summary of today:

It was a bit hard, but interesting.

Linear regression with time series

  • Time-step features
    • It can be extracted directly from the time axis and the characteristics of year, month, day, and time measured at regular time intervals from start to end are typical. This is a useful characteristic when the observed values have periodic properties
  • Lag features
  • Occurs as a time difference in the observed values, and the current observed values are expressed as previous observed values. This is a useful characteristic when the observed values have autocorrelation or series correlation

Trends

  • It refers to a pattern that increases or decreases uniformly over a long section as a whole
  • once we’ve identified the shape of the trend, we can attempt to model it using a time-step feature

image

Sasonality

  • Seasonality is one of the representative factors that occur in time step characteristics, where seasonality refers to periodic fluctuations, and time series data indicate seasonality when fluctuations in observations are repeatedly observed on a daily, weekly, monthly, and yearly basis

    image

  • I got introduced to Fourier features here
    • Used to capture and model periodic patterns in time series data
    • They are features based on amplitudes and phases of sine and cosine functions with different frequencies
    • Commonly used for tasks like forecasting to enhance the model’s ability to handle cyclic variations.
    • They enable models to understand and predict periodic structures in the data, enhancing overall predictive performance in tasks with recurring patterns
  • Periodogram - to know which Fourier feature to use, we can inspect the periodogram image

In this case, we can see annual, quarterly and weekly cycles occur

Hybrid Forecasting

  • Linear regression can reveal trends, but complex interactions cannot be easily revealed. On the other hand, XGBoost can reveal complex interactions, but cannot reveal trends
  • “We could imagine learning the components of a time series as an iterative process: first learn the trend and subtract it out from the series, then learn the seasonality from the detrended residuals and subtract the seasons out, then learn the cycles and subtract the cycles out, and finally only the unpredictable error remains.” image

  • Using linear regression and XGBoost together, will help our model’s predictions to be better

I did only 1 lecture today, but the content was a bit difficult. Still it was good and I hope I can actually apply the above methods in practice soon.

That is all for today!

See you tomorrow :)

Original post in Korean