Chapter 4, Part 3: Baseline Models

Machine Learning

Author

F.San Segundo & N.Rodríguez

Published

March 2026


Setting the working directory

We begin by using cd to make sure we are in the right folder.

Hide the code
%cd /wd/4_3_Baseline
/wd/4_3_Baseline

Session Setup

Install additional libraries

We need to install some libraries from the Nixtlaverse

Hide the code
import subprocess

def pip_install(package):
    result = subprocess.run(
        ["pip", "install", package],
        capture_output=True,
        text=True
    )
    if result.returncode != 0:
        print(f"Error installing {package}: {result.stderr}")
    else:
        print(f"Successfully installed {package}")

pip_install("fpppy")
pip_install("tsfeatures")
pip_install("pyreadr")
Successfully installed fpppy
Successfully installed tsfeatures
Successfully installed pyreadr
Libraries

Let us also load the libraries we will use.

Hide the code
# 1. System and Environment Setup
import os
import sys
import warnings
import random
import datetime
from tqdm import tqdm

# Set environment variables and ignore specific warnings
os.environ["NIXTLA_ID_AS_COL"] = "true"
warnings.filterwarnings(
    "ignore", 
    category=UserWarning, 
    message=".*FigureCanvasAgg is non-interactive.*"
)

# 2. Core Data Science Stack
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose

from statsforecast.arima import ARIMASummary, ndiffs, nsdiffs


# 3. Time Series & Machine Learning
import pmdarima as pmd
import yfinance as yf
import rdatasets
import pyreadr
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series as plot_series_sktime

# 4. Custom Utilities
from fc_4_3_utils.utils import *
from fpppy.utils import plot_series as plot_series_fpp
from utilsforecast.plotting import plot_series as plot_series_utils

from utilsforecast.evaluation import evaluate
from utilsforecast.losses import rmse, mape
from statsforecast import StatsForecast

from statsforecast.models import (
    HistoricAverage,
    Naive,
    RandomWalkWithDrift,
    SeasonalNaive,
)

from statsforecast.models import MSTL



# 5. Global Reproducibility & Display Settings
np.random.seed(1)
random.seed(1)
np.set_printoptions(suppress=True)

pd.set_option("max_colwidth", 100)
pd.set_option("display.precision", 3)

# 6. Plotting Configuration
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from cycler import cycler

# Jupyter Magic
%matplotlib inline
%config InlineBackend.figure_format = 'png' # High-res plots

# Style and Aesthetics
plt.style.use("ggplot")
sns.set_style("whitegrid")

plt.rcParams.update({
    "figure.figsize": (8, 5),
    "figure.dpi": 100,
    "savefig.dpi": 300,
    "figure.constrained_layout.use": True,
    "axes.titlesize": 12,
    "axes.labelsize": 10,
    "xtick.labelsize": 9,
    "ytick.labelsize": 9,
    "legend.fontsize": 9,
    "legend.title_fontsize": 10,
})

# Custom color cycle (Example: monochromatic black as per your original)
mpl.rcParams['axes.prop_cycle'] = cycler(color=["#000000", "#000000"])
Hide the code
import warnings
import os

# Ignore all warnings
warnings.filterwarnings("ignore")

# Specifically target Scikit-Learn/FutureWarnings
warnings.filterwarnings("ignore", category=FutureWarning)

# Some environments also require setting an environment variable
os.environ['PYTHONWARNINGS'] = 'ignore'

Train / Test splits and validation in time series forecasting

Temporal train-test splits

In time series forecasting, we need to be careful when splitting the data into training and testing sets. We cannot use random splits, because we need to preserve the temporal order of the data and we must avoid using future unknown data to predict past data.

Therefore we need to use time-aware splits, where the training set contains the data up to a certain point in time, and the testing set contains the data after that point. The split is often performed directly by splitting the data into two parts, but some libraries (like sktime) have functions to perform this split.

Highly recommended reading and video watching: R. J. Hyndman and Athanasopoulos (2021), Section 5.8

We will use the Canadian gas production dataset to illustrate these splitting methods.

Hide the code
cgas = pd.read_csv('../4_1_Introduction_to_Forecasting/tsdata/fpp3/Canadian_gas.csv')
cgas.head()
Month Volume
0 1960 Jan 1.431
1 1960 Feb 1.306
2 1960 Mar 1.402
3 1960 Apr 1.170
4 1960 May 1.116
Hide the code
cgas["Month"] = pd.to_datetime(cgas["Month"], format="%Y %b")
cgas.head()
Month Volume
0 1960-01-01 1.431
1 1960-02-01 1.306
2 1960-03-01 1.402
3 1960-04-01 1.170
4 1960-05-01 1.116
Hide the code
# Set 'Month' as index and set frequency as Month end, required by sktime
cgas["Month"] = cgas["Month"] + pd.offsets.MonthEnd(0)
cgas.set_index("Month", inplace=True)
cgas.index.freq = 'M'

cgas.head()
Volume
Month
1960-01-31 1.431
1960-02-29 1.306
1960-03-31 1.402
1960-04-30 1.170
1960-05-31 1.116

Let us also create a Nixtla format time series for this example.

Hide the code
fpppy_path = '/wd/data/fpppy/'
cgas_nx = pd.read_csv(fpppy_path + '/canadian_gas.csv', parse_dates=['ds'])
cgas_nx.tail()
unique_id ds y
537 Canadian gas production 2004-10-01 17.827
538 Canadian gas production 2004-11-01 17.832
539 Canadian gas production 2004-12-01 19.453
540 Canadian gas production 2005-01-01 19.528
541 Canadian gas production 2005-02-01 16.944

Recall the time plot of the series:

Hide the code
plot_series_fpp(cgas_nx, title="Canadian Gas Production", xlabel="Year", ylabel="Gas Production (in billions of cubic metres)")

We can create a manual train/test split with the usual 80/20 proportions by first defining the temporal index where the split occurs:

Hide the code
train_end = int(np.floor(len(cgas_nx) * 0.8))
train_end
433

And now the manual split is simply done by slicing the data as follows:

Hide the code
cgas_train = cgas_nx.iloc[:train_end]
cgas_test = cgas_nx.iloc[train_end:]

It is always a good idea to check the result:

Hide the code
cgas_train.tail(1)
unique_id ds y
432 Canadian gas production 1996-01-01 17.147
Hide the code
cgas_test.head(1)
unique_id ds y
433 Canadian gas production 1996-02-01 15.693

We can extract the last position/date of the training set and the first position/date of the testing set as follows:

Hide the code
cgas_train.index[-1], cgas_train.ds.values[-1] 
(432, np.datetime64('1996-01-01T00:00:00.000000000'))
Hide the code
cgas_test.index[0], cgas_test.ds.values[0]
(433, np.datetime64('1996-02-01T00:00:00.000000000'))

And the shapes of the training and testing sets as follows:

Hide the code
cgas_train.shape, cgas_test.shape
((433, 3), (109, 3))

We can also extract the date of the split:

Hide the code
train_end_date = cgas_train['ds'].values[-1]
train_end_date.astype('datetime64[D]')
np.datetime64('1996-01-01')

So the split could also be done taking a date such as that one as a reference:

Hide the code
cgas_train = cgas_nx.query("ds <= @train_end_date")
cgas_test = cgas_nx.query("ds > @train_end_date")

Or equivalently:

Hide the code
train_end_date_str = "1996-01-01"
cgas_train = cgas_nx.query("ds <= @train_end_date_str")
cgas_test = cgas_nx.query("ds > @train_end_date")

Which results in the same train/test split:

Hide the code
cgas_train.tail(1)
unique_id ds y
432 Canadian gas production 1996-01-01 17.147

It is also useful to visualize the split in a time plot:

Hide the code
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(cgas_train['y'], label='train', color='blue')
ax.plot(cgas_test['y'], label='test', color='orange')
ax.legend(["train", "test"])
plt.show();plt.close()

Train/test split and forecasting horizon

We got used to the 80/20 proportion for the splits in the context of the basic regression problem of Machine Learning. But in forecasting things are different. For example, with a monthly time series, we are often interested in making predictions for a given forecasting horizon, usually denoted h. If we need the prediction for next moth, h = 1, for the next three months (a quarter), then h = 3, and if we want a whole year then h = 12. These are all typical scenarios, and if we want to evaluate the performance of a model, this has clear consequences on the way we evaluate our models. The evaluation should reflect the way that our forecasting model is going to be used in practice, and what you model update policy is going to be.

That is, imagine it is December, and your model is tasked with predicting the next twelve months. You make your forecast with h = 12, and then the new January data comes in. In many cases, you will want to retrain your model using that new data, to keep it as updated as possible. If that is the case, then your model performance valuation should reflect that, But using a single test set with length equal to the forecasting horizon is not going to be enough in most cases, and it will not give you any idea about the variance in your model prediction accuracy. In such cases, we need to consider cross validation strategies such as those described below.

In other cases, for example for short term prediction (we need a forecast for tomorrow or for the next hour), retraining the model is not possible or becomes too costly. In any case, you model evaluation should always reflect how the model is going to be used and how often it will be updated.

Cross validation in forecasting

Highly recommended reading and video watching: R. J. Hyndman and Athanasopoulos (2021), Section 5.10. We will use some figures from that section below.

We know by now that we should use a validation set to assess the performance of our models and to fine-tune the hyperparameters of our models. In time series forecasting, we should use a validation set that is also time-aware. That is, any validation data should be in the future of the training data. And if we plan to use cross-validation, the same time-awareness extends to the folds of the cross-validation.

There are several different approaches to cross validation in time series forecasting. The expanding window approach is the most common one. In this approach, the training set starts with an initial window and grows with each fold by adding some points at the end of the training set. The number of points added is called the step. The test set is always comprised by a fixed number of points following the training set. That fixed number of points in the test set is equal to our forecasting horizon (usually represented as fh in many forecasting libraries), so that the validation reflects the way the model will be used. Expanding windows with fh=1 are illustrated in the figure below, from (Rob J. Hyndman et al. 2025), Section 5.10. The blue dots represent the training set, the orange dot represents the test set.

A case when fh=4 is illustrated in the figure below, also from (Rob J. Hyndman et al. 2025, sec. 5.10).

The expanding window is also often called rolling forecasting origin.

In the Nixtlaverse StatsForecast library we need to specify a model before we can create a cross validation strategy. So let us introduce some very simple forecasting models and then we will resume this discussion.

Baseline Models

What is a baseline model?

In the following sessions we are going to introduce some fairly sophisticated forecasting methods. Classical statistical models in the Arima and Exponential Smoothing families, Machine Learning and Deep Learning Models, etc. But these models come at a computational and conceptual cost, and their interpretation is often not immediate or even possible On the other hand, we can consider some very simple models based on the characteristics of the time series. We will then use these models as a baseline against which we compare the fancier models. If they are not substantially better than the very basic baselines, then they are probably not worth the cost.

Note: we don’t have time to cover one of the classical families of models, the Exponential Smoothing models. We refer you to (Rob J. Hyndman et al. 2025, chap. 8 for that),

The naive model

Arguably the simplest forecasting strategy we can think of. When we need to forecast a future value we simply repeat the last observed value, as many times as the forecasting horizon requires. In particular this means that the time plot of the forecast is a flat, horizontal line.

In StatsForecast we define one such model as follows (here ‘MS’ indicates month start):

Hide the code
naive_method = Naive()
cgas_naive = StatsForecast(models=[naive_method], freq="MS")

Recall that the last row of our training set is:

Hide the code
cgas_train.tail(1)
unique_id ds y
432 Canadian gas production 1996-01-01 17.147

So if we fit the naive model and make a forecast with horizon h = 3 we get:

Hide the code
cgas_naive.fit(cgas_train)
cgas_naive.predict(h=3)
unique_id ds Naive
0 Canadian gas production 1996-02-01 17.147
1 Canadian gas production 1996-03-01 17.147
2 Canadian gas production 1996-04-01 17.147

As we can see, the prediction is flat and equals the last value in the training dataset.

Hide the code
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(cgas_train.set_index('ds')['y'].tail(20), label='train', color='blue')
ax.plot(cgas_naive.predict(h=3).set_index('ds')[['Naive']], label='test', color='orange')
ax.legend(["train", "test"])
plt.show();plt.close()

Cross validation for naive forecasting

Now that we have specified a model, we can select a forecasting horizon, In this example, let us assume that we want to forecast the next quarter (three months) of gas production, so we take:

Hide the code
fh = 3

Now we can use the cross_validation function to apply the rolling window validation method.

Hide the code
cgas_cv_df = cgas_naive.cross_validation(
    df = cgas_nx,
    h = fh,
    step_size = 1,
    n_windows = 4)

The code above uses n_windows = 4. This is equal to the number of performance measures that we will obtain, and here we are setting a particularly low value for teaching purposes. If you refer to the pictures above it is as we are only considering the bottom four rows of the picture. The object returned by cross_validation is this data frame:

Hide the code
cgas_cv_df
unique_id ds cutoff y Naive
0 Canadian gas production 2004-09-01 2004-08-01 16.907 17.641
1 Canadian gas production 2004-10-01 2004-08-01 17.827 17.641
2 Canadian gas production 2004-11-01 2004-08-01 17.832 17.641
3 Canadian gas production 2004-10-01 2004-09-01 17.827 16.907
4 Canadian gas production 2004-11-01 2004-09-01 17.832 16.907
5 Canadian gas production 2004-12-01 2004-09-01 19.453 16.907
6 Canadian gas production 2004-11-01 2004-10-01 17.832 17.827
7 Canadian gas production 2004-12-01 2004-10-01 19.453 17.827
8 Canadian gas production 2005-01-01 2004-10-01 19.528 17.827
9 Canadian gas production 2004-12-01 2004-11-01 19.453 17.832
10 Canadian gas production 2005-01-01 2004-11-01 19.528 17.832
11 Canadian gas production 2005-02-01 2004-11-01 16.944 17.832

How does this work? Note that the cutoff column takes four different values, corresponding to our choice of n_windows = 4

Hide the code
cgas_cv_df['cutoff'].value_counts()
cutoff
2004-08-01    3
2004-09-01    3
2004-10-01    3
2004-11-01    3
Name: count, dtype: int64
  • Besides, each cutoff value shows the temporal index of the last element of the training data used for that fold. And since the forecasting horizon equals 3, there are three rows associated with that cutoff.
  • The ds values for those three rows are the data points used as validation in that fold.

We can make that explicit by adding a column that indicates the fold/sliding window:

Hide the code
cutoff_order = cgas_cv_df["cutoff"].drop_duplicates().sort_values()
cutoff_to_fold = {cutoff: f"fold_{i+1}" for i, cutoff in enumerate(cutoff_order)}

cgas_cv_df["fold"] = cgas_cv_df["cutoff"].map(cutoff_to_fold)

cgas_cv_df
unique_id ds cutoff y Naive fold
0 Canadian gas production 2004-09-01 2004-08-01 16.907 17.641 fold_1
1 Canadian gas production 2004-10-01 2004-08-01 17.827 17.641 fold_1
2 Canadian gas production 2004-11-01 2004-08-01 17.832 17.641 fold_1
3 Canadian gas production 2004-10-01 2004-09-01 17.827 16.907 fold_2
4 Canadian gas production 2004-11-01 2004-09-01 17.832 16.907 fold_2
5 Canadian gas production 2004-12-01 2004-09-01 19.453 16.907 fold_2
6 Canadian gas production 2004-11-01 2004-10-01 17.832 17.827 fold_3
7 Canadian gas production 2004-12-01 2004-10-01 19.453 17.827 fold_3
8 Canadian gas production 2005-01-01 2004-10-01 19.528 17.827 fold_3
9 Canadian gas production 2004-12-01 2004-11-01 19.453 17.832 fold_4
10 Canadian gas production 2005-01-01 2004-11-01 19.528 17.832 fold_4
11 Canadian gas production 2005-02-01 2004-11-01 16.944 17.832 fold_4

The last remaining piece to understand is the Naive column, that contains the predictions (forecasts) of the model for the points in that fold. In this example, using the Naive model, you can see that the forecast is constant (flat) for each fold and its value equals the value of the time series at the last point of the training segment of that fold (the cutoff for that fold).

Exercise 001
  • Change the forecasting horizon parameter to fh = 12 and run the cv code again. What happens?
  • Also play with different values of the step parameter to see what happens.

Performance evaluation in forecasting

In order to do that, we need to calculate some error metrics. We recommend reading Section 5.8 of Rob J. Hyndman et al. (2025) for a detailed explanation of these metrics.

Metrics for time series forecasting. Validation and test metrics.

Since forecasting time series is essentially a regression problem, we can use the same metrics we used for regression problems. The most common metrics are: + Mean Absolute Error (MAE). Minimizing this means we will predict the median of the time series. + Root Mean Squared Error (RMSE). Minimizing this means we will predict the mean of the time series.
+ Mean Absolute Percentage Error (MAPE). + Symmetric Mean Absolute Percentage Error (sMAPE). Hyndman recommends not using it, because it can be numerically unstable. + Mean absolute scaled error (MASE).

The first two measures MAE and RMSE are, as we know, scale dependent. But when we use them to compare different models for the same time series, the scale dependency is not a problem.

To compute the validation metrics, we need to calculate the error metrics for each fold of the cross-validation. We can then average the error metrics to get a single value for each metric.

Statsforecast provides an evaluate function that we can use as follows:

Hide the code
cgas_naive_eval = evaluate(cgas_cv_df, metrics=[rmse, mape], models=["Naive"])
cgas_naive_eval
unique_id cutoff metric Naive
0 Canadian gas production 2004-08-01 rmse 0.451
1 Canadian gas production 2004-09-01 rmse 1.652
2 Canadian gas production 2004-10-01 rmse 1.359
3 Canadian gas production 2004-11-01 rmse 1.448
4 Canadian gas production 2004-08-01 mape 0.022
5 Canadian gas production 2004-09-01 mape 0.078
6 Canadian gas production 2004-10-01 mape 0.057
7 Canadian gas production 2004-11-01 mape 0.074

As you see we get an evaluation value for each metric and fold combination. We can summarize the performance by computing metric averages over all the folds:

Hide the code
cgas_naive_eval.groupby("metric")['Naive'].mean()
metric
mape    0.058
rmse    1.227
Name: Naive, dtype: float64

Other baseline models. Model comparison.

Baseline models assumptions

See Rob J. Hyndman et al. (2025), Section 5.2

The so called baseline or benchmark models are all based on the idea that the future behavior is likely to be similar to the basic patterns observed in the past. The difference between these models is how they define the past and how that past information is used to make predictions.

  • For example, we can simply decide to use the last observed value as the prediction for the future. This is what we did in tne Naive model (also called last observed model).
  • We could instead use the mean of the (past values of the) time series as prediction. This is the mean method, implemented in the Nixtlaverse as HistoricAverage. Note that this also returns a flat prediction, as in the naive method.
  • Or, for seasonal time series, we could use the last observed value from the same season of the preceding seasonal period as the prediction for the future. This is the seasonal naïve method, provided as SeasonalNaive in Nixtla.
  • The drift model makes the simplest possible estimation of trend, in that it connects the first and last points in the past data with a line, and extends that line into the future to make predictions. Nixtla implements this as RandomWalkWithDrift.

Check the Nixtla documentation for more details and other options. Also take the opportunity to browse the model collection offered by Statsforecast (and that’s only one part of the Nixtlaverse).

Seasonal models

Keep in mind that when using seasonal naive models, you need to specify the seasonal period of the time series. Be careful: choosing the wrong seasonal period can lead to very poor predictions! Always think about the sampling frequency of the data and the compatible seasonal periods.

This is the first time we warn you, but it will not be the last!

Let us apply these baseline models to the Canadian gas production dataset. We will repeat the Naive model in the list for ease of model comparison below.

We begin by creating instances of the model classes and putting them in a model list:

Hide the code
freq = 12
cgas_mean = HistoricAverage()
cgas_naive = Naive()
cgas_seasonal = SeasonalNaive(freq)
cgas_drift = RandomWalkWithDrift()

cgas_models = [cgas_mean, cgas_naive, cgas_seasonal, cgas_drift]

Now we can pass that list to Statsforecast:

Hide the code
cgas_baselines = StatsForecast(models=cgas_models, freq="MS")

Before moving on, we need to select a forecasting horizon, Again, we assume that we want to forecast the next quarter (three months) of gas production:

Hide the code
fh = 3

Now we can use the same cross_validation function with this cgas_baselines object. But this time, in line with the idea of 10 fold cross validation, we use ten windows:

Hide the code
cgas_cv_df = cgas_baselines.cross_validation(
    df = cgas_nx,
    h = fh,
    step_size = 1,
    n_windows = 10)

cgas_cv_df.shape
(30, 8)
Hide the code
cgas_cv_df.head()
unique_id ds cutoff y HistoricAverage Naive SeasonalNaive RWD
0 Canadian gas production 2004-03-01 2004-02-01 18.333 9.591 17.782 18.413 17.813
1 Canadian gas production 2004-04-01 2004-02-01 17.848 9.591 17.782 17.447 17.844
2 Canadian gas production 2004-05-01 2004-02-01 18.093 9.591 17.782 17.560 17.875
3 Canadian gas production 2004-04-01 2004-03-01 17.848 9.608 18.333 17.447 18.365
4 Canadian gas production 2004-05-01 2004-03-01 18.093 9.608 18.333 17.560 18.397

Be careful here! Even though we passed our list of model names (such as cgas_seasonal), the cross_validation function returns column names based on the model classes. That is why we are going to extract those column names to pass them to the evaluation function.

Hide the code
model_columns = cgas_cv_df.columns[-len(cgas_models):].tolist()
model_columns
['HistoricAverage', 'Naive', 'SeasonalNaive', 'RWD']

And we evaluate the performance of each model as before:

Hide the code
cgas_baselines_eval = evaluate(cgas_cv_df, metrics=[rmse, mape], models=model_columns)
cgas_baselines_eval
unique_id cutoff metric HistoricAverage Naive SeasonalNaive RWD
0 Canadian gas production 2004-02-01 rmse 8.502 0.367 0.388 0.326
1 Canadian gas production 2004-03-01 rmse 8.104 0.743 0.427 0.807
2 Canadian gas production 2004-04-01 rmse 8.099 0.419 0.458 0.448
3 Canadian gas production 2004-05-01 rmse 7.928 0.609 0.340 0.657
4 Canadian gas production 2004-06-01 rmse 7.833 0.517 0.287 0.504
5 Canadian gas production 2004-07-01 rmse 7.800 0.578 0.037 0.622
6 Canadian gas production 2004-08-01 rmse 7.850 0.451 0.068 0.451
7 Canadian gas production 2004-09-01 rmse 8.707 1.652 0.191 1.591
8 Canadian gas production 2004-10-01 rmse 9.259 1.359 0.251 1.296
9 Canadian gas production 2004-11-01 rmse 8.995 1.448 0.541 1.433
10 Canadian gas production 2004-02-01 mape 0.470 0.017 0.019 0.014
11 Canadian gas production 2004-03-01 mape 0.457 0.036 0.023 0.040
12 Canadian gas production 2004-04-01 mape 0.456 0.018 0.025 0.020
13 Canadian gas production 2004-05-01 mape 0.451 0.031 0.016 0.034
14 Canadian gas production 2004-06-01 mape 0.447 0.027 0.010 0.027
15 Canadian gas production 2004-07-01 mape 0.446 0.025 0.002 0.028
16 Canadian gas production 2004-08-01 mape 0.447 0.022 0.003 0.019
17 Canadian gas production 2004-09-01 mape 0.471 0.078 0.008 0.075
18 Canadian gas production 2004-10-01 mape 0.486 0.057 0.012 0.055
19 Canadian gas production 2004-11-01 mape 0.476 0.074 0.027 0.074

Looks good. We used 10 folds for two metrics, so we get 20 rows. Now we can et average performance measures over the folds:

Hide the code
cgas_baselines_eval.groupby("metric")[model_columns].mean()
HistoricAverage Naive SeasonalNaive RWD
metric
mape 0.461 0.038 0.015 0.039
rmse 8.308 0.814 0.299 0.814

As could be expected, since the Canadian gas time series is strongly seasonal, we get the best results using the only baseline method that exploits that seasonality.

Predictions and visualization

We can also obtain and visualize the model future forecasts (predictions), based on the forecasting horizon we are using.

Hide the code
cgas_baselines_preds = cgas_baselines.forecast(df = cgas_nx, h=fh)
cgas_baselines_preds
unique_id ds HistoricAverage Naive SeasonalNaive RWD
0 Canadian gas production 2005-03-01 9.777 16.944 18.333 16.973
1 Canadian gas production 2005-04-01 9.777 16.944 17.848 17.001
2 Canadian gas production 2005-05-01 9.777 16.944 18.093 17.030

Note that these forecasts extend into the future for which we have no observed data available for comparison (the last observed date in the time series is 2005-02-01).

We can also use predict, but this requires calling fit first. The result is exactly the same in this case.

Hide the code
cgas_baselines.fit(cgas_nx)
cgas_baselines.predict(h=fh)
unique_id ds HistoricAverage Naive SeasonalNaive RWD
0 Canadian gas production 2005-03-01 9.777 16.944 18.333 16.973
1 Canadian gas production 2005-04-01 9.777 16.944 17.848 17.001
2 Canadian gas production 2005-05-01 9.777 16.944 18.093 17.030
Hide the code
plot_df = cgas_baselines_preds.copy()
plot_df['ds'] = pd.to_datetime(plot_df['ds'])
plot_df.set_index('ds', inplace=True)
plot_df.head()
unique_id HistoricAverage Naive SeasonalNaive RWD
ds
2005-03-01 Canadian gas production 9.777 16.944 18.333 16.973
2005-04-01 Canadian gas production 9.777 16.944 17.848 17.001
2005-05-01 Canadian gas production 9.777 16.944 18.093 17.030
Hide the code
plot_df['HistoricAverage']
ds
2005-03-01    9.777
2005-04-01    9.777
2005-05-01    9.777
Name: HistoricAverage, dtype: float64
Hide the code

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(cgas_nx.set_index('ds', inplace=False)['y'][-48:], label='train', color='blue')
for model in model_columns:
    ax.plot(plot_df[model], label=model, color=plt.cm.tab10(model_columns.index(model))) 
ax.legend(["Train data"] + model_columns)
plt.show();plt.close()

Exercise 003
  • Again, change the forecasting horizon parameter to fh = 12 and run the last sections of code (including all the baseline models). How do they compare in this case?
  • How many windows/folds of cross validation can you use?

Decomposition based modeling

Time series dynamics and decomposition

In many cases when the dynamic of a time series is dominated by a strong component, either trend or seasonality, the naive models that pay attention to that component are the best ones. But when both components are important, naive models struggle to capture the complexity of the time series, as we have seen in the Canadian Gas example.

In a preceding session we introduced the MSTL method, and we are going to use it here as an easily accessible and interpretable method to showcase the typical situation in which we will apply a more advanced method and compare it with the baselines.

We have already seen how to implement this model in a previous session, so we repeat the code:

Hide the code
model_mstl = [MSTL(season_length=[12])]
cgas_mstl = StatsForecast(models=model_mstl, freq='MS')

We are going to use the same folds as we used for the baseline models.

Hide the code
cgas_cv_df_mstl = cgas_mstl.cross_validation(
    df = cgas_nx,
    h = fh,
    step_size = 1,
    n_windows = 10)

cgas_cv_df_mstl.shape
(30, 5)

And this is how the output looks:

Hide the code
cgas_cv_df_mstl.head()
unique_id ds cutoff y MSTL
0 Canadian gas production 2004-03-01 2004-02-01 18.333 18.885
1 Canadian gas production 2004-04-01 2004-02-01 17.848 18.167
2 Canadian gas production 2004-05-01 2004-02-01 18.093 18.439
3 Canadian gas production 2004-04-01 2004-03-01 17.848 17.857
4 Canadian gas production 2004-05-01 2004-03-01 18.093 18.123

Recall that we had this for the baseline models:

Hide the code
cgas_cv_df.head()
unique_id ds cutoff y HistoricAverage Naive SeasonalNaive RWD MSTL
0 Canadian gas production 2004-03-01 2004-02-01 18.333 9.591 17.782 18.413 17.813 18.885
1 Canadian gas production 2004-04-01 2004-02-01 17.848 9.591 17.782 17.447 17.844 18.167
2 Canadian gas production 2004-05-01 2004-02-01 18.093 9.591 17.782 17.560 17.875 18.439
3 Canadian gas production 2004-04-01 2004-03-01 17.848 9.608 18.333 17.447 18.365 17.857
4 Canadian gas production 2004-05-01 2004-03-01 18.093 9.608 18.333 17.560 18.397 18.123

But since the folds are the same we can easily incorporate the information for MSTL as an extra column:

Hide the code
cgas_cv_df['MSTL'] = cgas_cv_df_mstl['MSTL']

And also update the models columns list:

Hide the code
model_columns.append('MSTL')
model_columns
['HistoricAverage', 'Naive', 'SeasonalNaive', 'RWD', 'MSTL']

Now we simply can call evaluate to get an updated table including MSTL:

Hide the code
cgas_models_eval = evaluate(cgas_cv_df, metrics=[rmse, mape], models=model_columns)
cgas_models_eval
unique_id cutoff metric HistoricAverage Naive SeasonalNaive RWD MSTL
0 Canadian gas production 2004-02-01 rmse 8.502 0.367 0.388 0.326 0.418
1 Canadian gas production 2004-03-01 rmse 8.104 0.743 0.427 0.807 0.034
2 Canadian gas production 2004-04-01 rmse 8.099 0.419 0.458 0.448 0.090
3 Canadian gas production 2004-05-01 rmse 7.928 0.609 0.340 0.657 0.203
4 Canadian gas production 2004-06-01 rmse 7.833 0.517 0.287 0.504 0.301
5 Canadian gas production 2004-07-01 rmse 7.800 0.578 0.037 0.622 0.330
6 Canadian gas production 2004-08-01 rmse 7.850 0.451 0.068 0.451 0.191
7 Canadian gas production 2004-09-01 rmse 8.707 1.652 0.191 1.591 0.406
8 Canadian gas production 2004-10-01 rmse 9.259 1.359 0.251 1.296 0.549
9 Canadian gas production 2004-11-01 rmse 8.995 1.448 0.541 1.433 0.519
10 Canadian gas production 2004-02-01 mape 0.470 0.017 0.019 0.014 0.022
11 Canadian gas production 2004-03-01 mape 0.457 0.036 0.023 0.040 0.002
12 Canadian gas production 2004-04-01 mape 0.456 0.018 0.025 0.020 0.005
13 Canadian gas production 2004-05-01 mape 0.451 0.031 0.016 0.034 0.010
14 Canadian gas production 2004-06-01 mape 0.447 0.027 0.010 0.027 0.016
15 Canadian gas production 2004-07-01 mape 0.446 0.025 0.002 0.028 0.019
16 Canadian gas production 2004-08-01 mape 0.447 0.022 0.003 0.019 0.010
17 Canadian gas production 2004-09-01 mape 0.471 0.078 0.008 0.075 0.015
18 Canadian gas production 2004-10-01 mape 0.486 0.057 0.012 0.055 0.025
19 Canadian gas production 2004-11-01 mape 0.476 0.074 0.027 0.074 0.026

And finally extract the average performance metrics for all models across the folds:

Hide the code
cgas_models_eval.groupby("metric")[model_columns].mean()
HistoricAverage Naive SeasonalNaive RWD MSTL
metric
mape 0.461 0.038 0.015 0.039 0.015
rmse 8.308 0.814 0.299 0.814 0.304

Visualization

To add the MSTL predictions to the visualization we need to call forecast like this:

Hide the code
cgas_mstl_preds = cgas_mstl.forecast(df = cgas_nx, h=fh)
cgas_mstl_preds
unique_id ds MSTL
0 Canadian gas production 2005-03-01 18.462
1 Canadian gas production 2005-04-01 17.739
2 Canadian gas production 2005-05-01 18.014

Now we add them to the predictions data frame:

Hide the code
cgas_models_preds = cgas_baselines_preds.copy()
cgas_models_preds['MSTL'] = cgas_mstl_preds['MSTL']
cgas_models_preds
unique_id ds HistoricAverage Naive SeasonalNaive RWD MSTL
0 Canadian gas production 2005-03-01 9.777 16.944 18.333 16.973 18.462
1 Canadian gas production 2005-04-01 9.777 16.944 17.848 17.001 17.739
2 Canadian gas production 2005-05-01 9.777 16.944 18.093 17.030 18.014

And we only need updating the plot_df data frame to get the desired graphic, combining MSTL and the baselines.

Hide the code
plot_df = cgas_models_preds.copy()
plot_df['ds'] = pd.to_datetime(plot_df['ds'])
plot_df.set_index('ds', inplace=True)
plot_df.head()



fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(cgas_nx.set_index('ds', inplace=False)['y'][-48:], label='train', color='blue')
for model in model_columns:
    ax.plot(plot_df[model], label=model, color=plt.cm.tab10(model_columns.index(model))) 
ax.legend(["Train data"] + model_columns)
plt.show();plt.close()

Exercise 004

Repeat the fh = 12 once more, now including the MSTL. What do you think? Now zoom out in the previous plot (you only need to change a number) and think again.


In the Next Sessions

We will meet the most important family of classical models, the SARIMA models. Then we will add additional time series as exogenous predictors (SARIMAX models). We will also discuss autoarima strategies.

References

References

Hyndman, R. J., and G. Athanasopoulos. 2021. Forecasting: Principles and Practice, 3rd Edition. OTexts. https://otexts.com/fpp3/.
Hyndman, Rob J., George Athanasopoulos, Azul Garza, Cristian Challu, Max Mergenthaler Canseco, and Kin G. Olivares. 2025. “Forecasting: Principles and Practice, the Pythonic Way.” Melbourne, Australia: OTexts. 2025. https://otexts.com/fpppy/.