###### A super-fast forecasting technique for time series data

Holt-Winters Exponential Smoothing is used for forecasting time series data that exhibits both a trend and a seasonal variation. The Holt-Winters technique is made up of the following four forecasting techniques stacked one over the other:

**Weighted Averages: **A weighted average is simply an average of *n* numbers where each number is given a certain weight and the denominator is the sum of those *n* weights. The weights are often assigned as per some weighing function. Common weighing functions are logarithmic, linear, quadratic, cubic and exponential. Averaging as a time series forecasting technique has the property of smoothing out the variation in the historical values while calculating the forecast. By choosing a suitable weighing function, the forecaster determines which historical values should be given emphasis for calculating future values of the time series.

**Exponential Smoothing: **The Exponential Smoothing (ES) technique forecasts the next value using a weighted average of all previous values where the weights decay exponentially from the most recent to the oldest historical value. When you use ES, you are making the crucial assumption that recent values of the time series are much more important to you than older values. The ES technique has two big shortcomings: It cannot be used when your data exhibits a trend and/or seasonal variations.

**Holt Exponential Smoothing: **The Holt ES technique fixes one of the two shortcomings of the simple ES technique. Holt ES can be used to forecast time series data that has a trend. But Holt ES fails in the presence of seasonal variations in the time series.

**Holt-Winters Exponential Smoothing: T**he Holt-Winters ES modifies the Holt ES technique so that it can be used in the presence of both trend and seasonality.

To understand how Holt-Winters Exponential Smoothing works, one must understand the following four aspects of a time series:

#### Level

The concept of level is best understood with an example. The following time series shows the closing stock price of Merck & Co. on NYSE. The horizontal red lines indicate some of the levels in the time series in its up and down journey:

## Trend

A time series whose level changes in some sort of a pattern is said to have a trend. A time series whose level changes randomly around some mean value can be said to exhibit a random trend. Apart from knowing that the trend is random, the concept of trend is not so useful when it’s random, compared to one where the trend can be modeled by some function.

Let’s zoom into one particular area of the above stock price chart to illustrate the concept of a positive trend:

Some of the commonly observed trends are linear, square, exponential, logarithmic, square root, inverse and 3rd degree or higher polynomials. These trends can be easily modeled using the corresponding mathematical function, namely, log(x), linear, x², exp(x) etc.

Highly non-linear trends require complex modeling techniques such as artificial neural networks to model them successfully.

A useful way to look at trend is as a rate or as the velocity of the time series at a given level.

This makes trend a vector that has a magnitude (rate of change) and a direction (increasing or decreasing).

Let’s kept this interpretation of trend as a rate or velocity at the back of our minds. We’ll use it when we deconstruct the forecasting equation of Holt-Winters Exponential Smoothing.

## Seasonality

Many time series show periodic up and down movements around the current level. This periodic up and down movement is called seasonality. Here is an example of a time series demonstrating a seasonal pattern:

## Noise

Noise is simply the aspect of the time series data that you cannot (or do not want to) explain.

Level, Trend, Seasonality and Noise are considered to interact in an additive or multiplicative manner to produce the final value of the time series that you observe:

### Multiplicative combination (with additive trend)

### Fully additive combination

## The Holt-Winters Exponential Smoothing Equation

We are now ready to look at the forecasting equations of the Holt-Winter’s Exponential Smoothing technique. We’ll first consider the case where trend adds to the current level, but the seasonality is multiplicative. This is a commonly situation in real world time series data.

Since we are specifying the forecasting model’s equations, we’ll leave out the noise term.

In the above equation, we are forecasting the value of the time series *k* time steps out into the future starting from some arbitrary step *i*. The seasonal variation is assumed to have a known period length of *m *time steps. For e.g. for an annual variation, *m=12.*

Let’s see how we can estimate *L_i, B_i *and *S_i.*

Let’s start with the estimate of trend *B_i *at step *i*:

The above equation estimates the trend *B_i* observed at step *i* by calculating it in two different ways as follows:

** [L_i−L_(i-1)]: **This is the difference between two consecutive levels and it represents the rate of change of the level at the level

*L_(i-1)*. One way to look at this term is to think of it as the velocity that the data has at level

*L_i*, coming in as it did from level

*L_(i-1)*.

** B_(i-1):** This is simply the rate of change of level at

*L_(i-1)*, expressed recursively. To calculate

*B_(i-1)*, we use the same equation for

*B_i*by replacing

*i*with

*(i-1)*, and we keep doing this until we reach

*B_0*whose value we assume as an initial condition. More on estimating initial conditions in a bit.

The following figure illustrates the recursive unraveling of the above recurrence relation for *B_i*:

It should now be apparent how exponential weighted averages form the underbelly of the Holt-Winters technique. Here are three important observations:

- The weights
*β(1-β), β(1-β)², β(1-β)³*…etc. form a geometric series. - Each term
*β(1-β)^k*of this series weights the difference between two consecutive levels*[L_(i-k) — L_(i-k-1)]*in the time series*.* - The difference between the most recent two levels
*[L_i — L_(i-1)]*has the largest weight*β*, and the weights decay exponentially by a factor of*(1-β)*as one goes back in time.

Also notice that the estimation of *B_i* requires us to know the level at steps *i *and *(i-1)*,* (i-2)* and so on until *L_0* which we assume as an initial condition.

Let’s now look at how to estimate level *L_i* at time step *i*:

Just as with trend *B_i*, the above equation estimates the level *L_i* by calculating it in two different ways and then taking a weighted average of the two estimates as follows:

** T_i/S_(i−m): **Recollect that we have assumed that level and seasonality are multiplicative, i.e.

*T_i=L_i*S_(i-m)*N_i.*It follows that a good estimate of

*L_i*is simply

*T_i/S_(i−m)*, if you choose to ignore the effect of noise

*N_i*.

** [L_(i-1)+B_(i-1)]:** In this term, we are estimating level

*L_i*by adding to

*L_(i-1)*the change in level that occurs from

*L_(i-1)*to

*L_i*, in other words the trend

*B_(i-1)*.

As with *B_i*, we solve this equation recursively until we hit *T_0*, *S_0*, *B_0* and *L_0. T_0 *is just the oldest data point in our training data set. *S_0, B_0 *and *L_0 *are the initial values of level, trend and seasonal variation. They are estimated using various techniques which I shall get to soon. For now, we’ll assume that they are set to some reasonable initial values.

In the above equation for *L_i*, in order to estimate *L_i*, we need to also estimate the contribution of the seasonal component *S_(i-m)*. So let’s look at how to estimate the seasonal component at step* i*:

You can see that the estimation strategy for the seasonal component *S_i* is similar to that for the trend *B_i* and level *L_i* in that it estimates *S_i* by calculating it in two different ways and then takes the weighted average of the two estimates.

Now that we know how to estimate the level, the trend and the seasonal component at time step* i*, we are ready to put the three estimates together to get an estimate for the forecast *F_(i+k)* at step *(i+k)*, as follows:

## Estimation of Initial conditions

Since all equations for the Holt-Winters method are recurrence relations, we need to supply a set of initial values to these estimating equations to get the forecasting engine started. Specifically, we need to set the values *of L_0, B_0 and S_0.*

There are several ways to set these initial values. I’ll explain the technique used by the Python *statsmodels* library. (We’ll soon use *statsmodels* for building a Holt-Winters ES estimator and use it to forecast 12 time steps out in the future).

**Estimating L_0**: Statsmodels sets

*L_0*to the average of all observed values of the time series that you supply it,

*lying at indexes 0, m, 2m, 3m and so on*, where

*m*is the seasonal period. For e.g. if you tell statsmodels that your time series exhibits a seasonal period of 12 months, it will calculate

*L_0*as follows:

Note that T_0 is the oldest value in your time series data.

You can use the Holt-Winters forecasting technique even if your time series does not display seasonality. In this case, statsmodels will set *L_0 *to the first value of the training data set. i.e.

*L_0 = T_0**, when there is no seasonal variation in the data*

**Estimating B_0**: If your time series displays an additive trend, i.e. its level changes linearly, statsmodels estimates the initial trend

*B_0*by calculating the rate of change of the observed value

*T_i*across

*m*time steps and then taking the mean of these rates. For e.g. if you tell statsmodels that your time series exhibits an additive trend and it has a seasonal period of 12 months, it will calculate

*B_0*as follows:

If your time series exhibits a multiplicative trend, i.e. the level grows at a rate that is proportional to the current level, *statsmodels *uses a slightly complex looking estimator for *B_0. *It is best illustrated using the example of annual seasonality (*m=12*):

But if your time series does not display a seasonal variation, *B_0* is simply set to *T_1/T_0 *if the trend is multiplicative, or to *(T_1 — T_0)* if the trend is additive.

**Estimating S_0: **If the seasonality is multiplicative i.e. the value of the seasonal variation at a given level is proportional to the value of the level, then

*S_0*is estimated as follows:

And when the seasonal variation is constant or it increases by a fixed amount at each level, i.e. it is additive, then *S_0* is estimated as follows:

When there is no seasonal variation in your time series, *S_0 *is [], an empty vector.

Notice one important thing. While *L_i *and *B_i *are scalars, *S_i* (and therefore *S_0*) is a *vector *of length *m *where *m* is the seasonal period. At each time step *i=0,1,2,…n *in your time series, the corresponding seasonal factor lying at vector position *(0 mod m), (1 mod m), (2 mod m),…,(i mod m),…,(n mod m)* is used in the calculation of the forecast *F_i.*

### Estimating α, β and γ

The weighing coefficients *α, β and γ* are estimated by giving them initial values and then iteratively optimizing their values for some suitable score. Minimization of the MSE (mean-squared-error) is a commonly used optimization goal. Statsmodels sets the initial *α* to *1/2m, β* to* 1/20m *and it sets the initial *γ *to *1/20*(1 — α) *when there is seasonality.

Once *L_0, B_0 and S_0* are estimated, and *α, β and γ* are set, we can use the recurrence relations for *L_i, B_i, S_i, F_i *and *F_(i+k) *to estimate the value of the time series at steps *0, 1, 2, 3, …, i,…,n,n+1,n+2,…,n+k*.

If your training data set has *n* data points, then positions *n+1,n+2,…,n+k *correspond to the *k* out-of-sample forecasts that you would generate using the Holt-Winters estimation technique.

## Using the Holt-Winters Exponential Smoothing in Python

We’ll estimate 12 future values of the time series of retail sales of used car dealers in the United States using the Holt-Winters Exponential Smoothing technique:

The data set is available for download over here.

Let’s start by importing all the required packages.

```
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.tsa.holtwinters import ExponentialSmoothing as HWES
```

Read the data set into a Pandas data frame. Note that the Date column (column 0) is the index column and it has the format mm-dd-yyyy.

```
df = pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
```

Set the index frequency explicitly to Monthly so that *statsmodels *does not have to try to infer it.

```
df.index.freq = 'MS'
```

Plot the data:

```
df.plot()
plt.show()
```

We get the following chart:

Split between the training and the test data sets. The last 12 periods form the test data.

```
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]
```

Build and train the model on the training data. In the above chart, the level of the time series seems to be increasing linearly. So we set the trend as additive. However, the seasonal variation around each level seems to be increasing in proportion to the current level. So we set the seasonality to multiplicative.

```
model = HWES(df_train, seasonal_periods=12, trend='add', seasonal='mul')
fitted = model.fit()
```

Print out the training summary.

```
print(fitted.summary())
```

We get the following output:

Create an out of sample forecast for the next 12 steps beyond the final data point in the training data set.

```
sales_forecast = fitted.forecast(steps=12)
```

Plot the training data, the test data and the forecast on the same plot.

```
fig = plt.figure()
fig.suptitle('Retail Sales of Used Cars in the US (1992-2020)')
past, = plt.plot(df_train.index, df_train, 'b.-', label='Sales History')
future, = plt.plot(df_test.index, df_test, 'r.-', label='Actual Sales')
predicted_future, = plt.plot(df_test.index, sales_forecast, 'g.-', label='Sales Forecast')
plt.legend(handles=[past, future, predicted_future])
plt.show()
```

Let’s zoom into the last 12 periods. You can see that the forecast lags behind sharp turning points as it rightly should for any moving average based forecasting technique:

Here is the complete source code:

## Citations and Copyrights

### Data Set

U.S. Census Bureau, Retail Sales: Used Car Dealers [MRTSSM44112USN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MRTSSM44112USN, June 17, 2020, under FRED copyright terms.

SILSO, World Data Center — Sunspot Number and Long-term Solar Observations, Royal Observatory of Belgium, on-line Sunspot Number catalogue: http://www.sidc.be/SILSO/, 1818–2020 (CC-BY-NA)

Merck & Co., Inc. (MRK), NYSE — Historical Adjusted Closing Price. Currency in USD, https://finance.yahoo.com/quote/MRK/history?p=MRK, 23-Jul-2020. Copyright Yahoo Finance and NYSE

### Papers

Peter R. Winters, Forecasting Sales by Exponentially Weighted Moving Averages. *Management Science 6 (3) 324-342* https://doi.org/10.1287/mnsc.6.3.324

Makridakis, S., Wheelwright, S. C., Hyndman, R. J. Forecasting Methods and Applications. Third Ed. *John Wiley & Sons*.

### Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

**PREVIOUS: **The Binomial Regression Model

**NEXT: **Regression With ARIMA Errors