Holt-Winters Exponential Smoothing - Statistical Modeling and Forecasting

A super-fast forecasting technique for time series data

Holt-Winters Exponential Smoothing is used for forecasting time series data that exhibits both a trend and a seasonal variation. The Holt-Winters technique is made up of the following four forecasting techniques stacked one over the other:

The key concepts upon which Holt-Winters Exponential Smoothing is based (Image by Author)

Weighted Averages: A weighted average is simply an average of n numbers where each number is given a certain weight and the denominator is the sum of those n weights. The weights are often assigned as per some weighing function. Common weighing functions are logarithmic, linear, quadratic, cubic and exponential. Averaging as a time series forecasting technique has the property of smoothing out the variation in the historical values while calculating the forecast. By choosing a suitable weighing function, the forecaster determines which historical values should be given emphasis for calculating future values of the time series.

Exponential Smoothing: The Exponential Smoothing (ES) technique forecasts the next value using a weighted average of all previous values where the weights decay exponentially from the most recent to the oldest historical value. When you use ES, you are making the crucial assumption that recent values of the time series are much more important to you than older values. The ES technique has two big shortcomings: It cannot be used when your data exhibits a trend and/or seasonal variations.

Holt Exponential Smoothing: The Holt ES technique fixes one of the two shortcomings of the simple ES technique. Holt ES can be used to forecast time series data that has a trend. But Holt ES fails in the presence of seasonal variations in the time series.

Holt-Winters Exponential Smoothing: The Holt-Winters ES modifies the Holt ES technique so that it can be used in the presence of both trend and seasonality.

To understand how Holt-Winters Exponential Smoothing works, one must understand the following four aspects of a time series:

Level

The concept of level is best understood with an example. The following time series shows the closing stock price of Merck & Co. on NYSE. The horizontal red lines indicate some of the levels in the time series in its up and down journey:

Trend

A time series whose level changes in some sort of a pattern is said to have a trend. A time series whose level changes randomly around some mean value can be said to exhibit a random trend. Apart from knowing that the trend is random, the concept of trend is not so useful when it’s random, compared to one where the trend can be modeled by some function.

Let’s zoom into one particular area of the above stock price chart to illustrate the concept of a positive trend:

An illustration of a positive trend between two consecutive levels (Image by Author)

Some of the commonly observed trends are linear, square, exponential, logarithmic, square root, inverse and 3rd degree or higher polynomials. These trends can be easily modeled using the corresponding mathematical function, namely, log(x), linear, x², exp(x) etc.

Highly non-linear trends require complex modeling techniques such as artificial neural networks to model them successfully.

A useful way to look at trend is as a rate or as the velocity of the time series at a given level.

This makes trend a vector that has a magnitude (rate of change) and a direction (increasing or decreasing).

Let’s kept this interpretation of trend as a rate or velocity at the back of our minds. We’ll use it when we deconstruct the forecasting equation of Holt-Winters Exponential Smoothing.

Seasonality

Many time series show periodic up and down movements around the current level. This periodic up and down movement is called seasonality. Here is an example of a time series demonstrating a seasonal pattern:

Daily sunspot count. Data source: SILSO (Image by Author)

Noise

Noise is simply the aspect of the time series data that you cannot (or do not want to) explain.

Level, Trend, Seasonality and Noise are considered to interact in an additive or multiplicative manner to produce the final value of the time series that you observe:

Multiplicative combination (with additive trend)

Additive trend, multiplicative seasonality, multiplicative noise (Image by Author)

Fully additive combination

Additive trend, seasonality and noise (Image by Author)

The Holt-Winters Exponential Smoothing Equation

We are now ready to look at the forecasting equations of the Holt-Winter’s Exponential Smoothing technique. We’ll first consider the case where trend adds to the current level, but the seasonality is multiplicative. This is a commonly situation in real world time series data.

Since we are specifying the forecasting model’s equations, we’ll leave out the noise term.

The forecast for time step (i+k) based on the Level, the Trend and the Seasonal component at time step i (Image by Author)

In the above equation, we are forecasting the value of the time series k time steps out into the future starting from some arbitrary step i. The seasonal variation is assumed to have a known period length of m time steps. For e.g. for an annual variation, m=12.

Let’s see how we can estimate L_i, B_i and S_i.

Let’s start with the estimate of trend B_i at step i:

The Trend at time step i as a function of the Level at i and the Trend at (i=1) (Image by Author)

The above equation estimates the trend B_i observed at step i by calculating it in two different ways as follows:

[L_i−L_(i-1)]: This is the difference between two consecutive levels and it represents the rate of change of the level at the level L_(i-1). One way to look at this term is to think of it as the velocity that the data has at level L_i, coming in as it did from level L_(i-1).

B_(i-1): This is simply the rate of change of level at L_(i-1), expressed recursively. To calculate B_(i-1), we use the same equation for B_i by replacing i with (i-1), and we keep doing this until we reach B_0 whose value we assume as an initial condition. More on estimating initial conditions in a bit.

The following figure illustrates the recursive unraveling of the above recurrence relation for B_i:

The recursive unrolling of Trend at time step i as a function of Trend at previous time steps (i-2), (i-2) etc. (Image by Author)

It should now be apparent how exponential weighted averages form the underbelly of the Holt-Winters technique. Here are three important observations:

The weights β(1-β), β(1-β)², β(1-β)³…etc. form a geometric series.
Each term β(1-β)^k of this series weights the difference between two consecutive levels [L_(i-k) — L_(i-k-1)] in the time series.
The difference between the most recent two levels [L_i — L_(i-1)] has the largest weight β, and the weights decay exponentially by a factor of (1-β) as one goes back in time.

Also notice that the estimation of B_i requires us to know the level at steps i and (i-1), (i-2) and so on until L_0 which we assume as an initial condition.

Let’s now look at how to estimate level L_i at time step i:

The Level at time step i as a function of time series value at i, seasonality at at (i-m) and Level and Trend at (i-1) (Image by Author)

Just as with trend B_i, the above equation estimates the level L_i by calculating it in two different ways and then taking a weighted average of the two estimates as follows:

T_i/S_(i−m): Recollect that we have assumed that level and seasonality are multiplicative, i.e. T_i=L_i*S_(i-m)*N_i. It follows that a good estimate of L_i is simply T_i/S_(i−m), if you choose to ignore the effect of noise N_i.

[L_(i-1)+B_(i-1)]: In this term, we are estimating level L_i by adding to L_(i-1) the change in level that occurs from L_(i-1) to L_i, in other words the trend B_(i-1).

As with B_i, we solve this equation recursively until we hit T_0, S_0, B_0 and L_0. T_0 is just the oldest data point in our training data set. S_0, B_0 and L_0 are the initial values of level, trend and seasonal variation. They are estimated using various techniques which I shall get to soon. For now, we’ll assume that they are set to some reasonable initial values.

In the above equation for L_i, in order to estimate L_i, we need to also estimate the contribution of the seasonal component S_(i-m). So let’s look at how to estimate the seasonal component at step i:

The Seasonal component at i as function of time series value at i, level at i and Seasonality at the previous seasonal time step (i-m) (Image by Author)

You can see that the estimation strategy for the seasonal component S_i is similar to that for the trend B_i and level L_i in that it estimates S_i by calculating it in two different ways and then takes the weighted average of the two estimates.

Now that we know how to estimate the level, the trend and the seasonal component at time step i, we are ready to put the three estimates together to get an estimate for the forecast F_(i+k) at step (i+k), as follows:

Estimation of Initial conditions

Since all equations for the Holt-Winters method are recurrence relations, we need to supply a set of initial values to these estimating equations to get the forecasting engine started. Specifically, we need to set the values of L_0, B_0 and S_0.

There are several ways to set these initial values. I’ll explain the technique used by the Python statsmodels library. (We’ll soon use statsmodels for building a Holt-Winters ES estimator and use it to forecast 12 time steps out in the future).

Estimating L_0: Statsmodels sets L_0 to the average of all observed values of the time series that you supply it, lying at indexes 0, m, 2m, 3m and so on, where m is the seasonal period. For e.g. if you tell statsmodels that your time series exhibits a seasonal period of 12 months, it will calculate L_0 as follows:

Note that T_0 is the oldest value in your time series data.

You can use the Holt-Winters forecasting technique even if your time series does not display seasonality. In this case, statsmodels will set L_0 to the first value of the training data set. i.e.

L_0 = T_0, when there is no seasonal variation in the data

Estimating B_0: If your time series displays an additive trend, i.e. its level changes linearly, statsmodels estimates the initial trend B_0 by calculating the rate of change of the observed value T_i across m time steps and then taking the mean of these rates. For e.g. if you tell statsmodels that your time series exhibits an additive trend and it has a seasonal period of 12 months, it will calculate B_0 as follows:

If your time series exhibits a multiplicative trend, i.e. the level grows at a rate that is proportional to the current level, statsmodels uses a slightly complex looking estimator for B_0. It is best illustrated using the example of annual seasonality (m=12):

But if your time series does not display a seasonal variation, B_0 is simply set to T_1/T_0 if the trend is multiplicative, or to (T_1 — T_0) if the trend is additive.

Estimating S_0: If the seasonality is multiplicative i.e. the value of the seasonal variation at a given level is proportional to the value of the level, then S_0 is estimated as follows:

And when the seasonal variation is constant or it increases by a fixed amount at each level, i.e. it is additive, then S_0 is estimated as follows:

When there is no seasonal variation in your time series, S_0 is [], an empty vector.

Notice one important thing. While L_i and B_i are scalars, S_i (and therefore S_0) is a vector of length m where m is the seasonal period. At each time step i=0,1,2,…n in your time series, the corresponding seasonal factor lying at vector position (0 mod m), (1 mod m), (2 mod m),…,(i mod m),…,(n mod m) is used in the calculation of the forecast F_i.

Estimating α, β and γ

The weighing coefficients α, β and γ are estimated by giving them initial values and then iteratively optimizing their values for some suitable score. Minimization of the MSE (mean-squared-error) is a commonly used optimization goal. Statsmodels sets the initial α to 1/2m, β to 1/20m and it sets the initial γ to 1/20*(1 — α) when there is seasonality.

Once L_0, B_0 and S_0 are estimated, and α, β and γ are set, we can use the recurrence relations for L_i, B_i, S_i, F_i and F_(i+k) to estimate the value of the time series at steps 0, 1, 2, 3, …, i,…,n,n+1,n+2,…,n+k.

If your training data set has n data points, then positions n+1,n+2,…,n+k correspond to the k out-of-sample forecasts that you would generate using the Holt-Winters estimation technique.

Using the Holt-Winters Exponential Smoothing in Python

We’ll estimate 12 future values of the time series of retail sales of used car dealers in the United States using the Holt-Winters Exponential Smoothing technique:

Retail Used Car Sales. Data source: US FRED (Image by Author)

The data set is available for download over here.

Let’s start by importing all the required packages.

import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES

Read the data set into a Pandas data frame. Note that the Date column (column 0) is the index column and it has the format mm-dd-yyyy.

df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])

Set the index frequency explicitly to Monthly so that statsmodels does not have to try to infer it.

df.index.freq = 'MS'

Plot the data:

df.plot()
plt.show()

We get the following chart:

Split between the training and the test data sets. The last 12 periods form the test data.

df_train = df.iloc[:-12]
df_test = df.iloc[-12:]

Build and train the model on the training data. In the above chart, the level of the time series seems to be increasing linearly. So we set the trend as additive. However, the seasonal variation around each level seems to be increasing in proportion to the current level. So we set the seasonality to multiplicative.

model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
fitted = model.fit()

Print out the training summary.

print(fitted.summary())

We get the following output:

Create an out of sample forecast for the next 12 steps beyond the final data point in the training data set.

sales_forecast = fitted.forecast(steps=12)

Plot the training data, the test data and the forecast on the same plot.

fig = plt.figure()
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
plt.legend(handles=[past, future, predicted_future])
plt.show()

Let’s zoom into the last 12 periods. You can see that the forecast lags behind sharp turning points as it rightly should for any moving average based forecasting technique:

Here is the complete source code:

	import pandas as pd
	from matplotlib import pyplot as plt
	from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES

	#read the data file. the date column is expected to be in the mm-dd-yyyy format.
	df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
	df.index.freq = 'MS'

	#plot the data
	df.plot()
	plt.show()

	#split between the training and the test data sets. The last 12 periods form the test data
	df_train = df.iloc[:-12]
	df_test = df.iloc[-12:]

	#build and train the model on the training data
	model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
	fitted = model.fit(optimized=True, use_brute=True)

	#print out the training summary
	print(fitted.summary())

	#create an out of sample forcast for the next 12 steps beyond the final data point in the training data set
	sales_forecast = fitted.forecast(steps=12)

	#plot the training data, the test data and the forecast on the same plot
	fig = plt.figure()
	fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
	past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
	future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
	predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
	plt.legend(handles=[past, future, predicted_future])
	plt.show()

view raw holt_winters.py hosted with ❤ by GitHub

Holt-Winters Exponential Smoothing

Citations and Copyrights

Data Set

U.S. Census Bureau, Retail Sales: Used Car Dealers [MRTSSM44112USN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MRTSSM44112USN, June 17, 2020, under FRED copyright terms.

SILSO, World Data Center — Sunspot Number and Long-term Solar Observations, Royal Observatory of Belgium, on-line Sunspot Number catalogue: http://www.sidc.be/SILSO/, 1818–2020 (CC-BY-NA)

Merck & Co., Inc. (MRK), NYSE — Historical Adjusted Closing Price. Currency in USD, https://finance.yahoo.com/quote/MRK/history?p=MRK, 23-Jul-2020. Copyright Yahoo Finance and NYSE

Papers

Peter R. Winters, Forecasting Sales by Exponentially Weighted Moving Averages. Management Science 6 (3) 324-342 https://doi.org/10.1287/mnsc.6.3.324

Makridakis, S., Wheelwright, S. C., Hyndman, R. J. Forecasting Methods and Applications. Third Ed. John Wiley & Sons.

Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

PREVIOUS: The Binomial Regression Model

NEXT: Regression With ARIMA Errors

UP: Table of Contents