# Understanding Partial Auto-correlation And The PACF

###### What is it, how is it calculated and when to use it

We’ll go over the concepts that drive the creation of the Partial Auto-Correlation Function (PACF) and we’ll see how these concepts lead to the development of the definition of partial auto-correlation and the formula for PACF.

I will demonstrate from first principles how the PACF can be calculated and we’ll compare the result with the value returned by statsmodels.tsa.stattools.pacf().

We’ll finish by seeing how to use PACF in time series forecasting.

## Laying the foundation for PACF: The auto-regressive time series

There are many phenomena in which the past influences the present. The events of yesterday can be used to foretell what will happen today. When such phenomena are represented as a time series, they are said to have an auto-regressive property. In an auto regressive time series, the current value can be expressed as a function of the previous value, the value before that one and so forth. In other words, the current value is correlated with previous values from the same time series.

If a time series is auto-regressive it is often the case that the current value’s forecast can be computed as a linear function of only the previous value and a constant, as follows:

Here T_i is the value that is forecast by the equation at the ith time step. Beta0 is the Y-intercept of the model and it applies a constant amount of bias to the forecast. It also specifies what will be the forecast for T_i if the value at the previous time step T_(i-1) happens to be zero. Beta1 tells us the rate at which T_i changes w.r.t. T_(i-1).

The key assumption behind this simple equation is that the variance in T_(i-1) is able to explain all the variance expressed by all values that are older than T_(i-1) in the time series. It is as if T_(i-1) captures all the information associated with values older than itself.

But what if this assumption were not true? What if the variance in T_(i-1) is not able to explain all of the variance contained within T_(i-2)? In that case, the above equation will not be able to feed this unexplained portion of the variance from T_(i-2) into T_i, causing the forecast for T_i to go off the mark.

Fortunately it’s easy to fix this problem adding a term to the above equation as follows:

In this equation the extra term Beta2*T_(i-2) seeks to capture the variance contained in values that are older than T_(i-1) that could not be explained by the variance in T_(i-1). It feeds this balance amount of information directly into the forecast for today’s value T_i.

With the background established let’s build the definition and the formula for the partial auto-correlation function.

## Building the definition of PACF

Let’s reproduce the above equation for reference:

It would be useful to know just how important the balance amount of variance in T_(i-2) is in predicting today’s value T_i. Why? Because it tells us if we need to add T_(i-2) as a variable in our forecast model for T_i. If the balance variance in T_(i-2) is not statistically significant, we can safely assume that all the variance in values that are older than T_(i-2) are either not significant for forecasting today’s value, or their significance is already captured in T_(i-1). Either way, it gives us the reason to fall back to our earlier simpler equation that contained only T_(i-1).

So how do we find out how important this balance amount of variance in T_(i-2) is in predicting today’s value T_i? Easy, we calculate the correlation coefficient between the two. Wait, but isn’t T_i also correlated with T_(i-1)? After all that is the whole basis for the above two equations! Of course it is. So what we actually want to find out is the correlation between the following two variables:

Variable I: The amount of variance in T_i that is not explained by the variance in T_(i-1), AND

Variable II: The amount of variance in T_(i-2) that is not explained by the variance in T_(i-1).

This correlation is called the partial auto-correlation of T_i with T_(i-2).

The definition of Variable II seems counter-intuitive. How can yesterday’s value explain day-before-yesterday’s value? To understand this, recollect that in an auto-regressive time series, some of the information from day-before-yesterday’s value is carried forward into yesterday’s value. This fact— in a strange sounding way — makes yesterday’s value a predictor for day-before-yesterday’s value!

Generalizing the above above argument leads to the following definition for the PACF:

The partial auto-correlation of T_i with a k lagged version of itself i.e. T_(i-k) is a correlation between the following two variables:

Variable 1: The amount of variance in T_i that is not explained by the variance in T_(i-1), T_(i-2)…T_(i-k+1), and,

Variable 2: The amount of variance in T_(i-k) that is not explained by the variance in T_(i-1), T_(i-2)…T_(i-k+1).

## Developing the formula for PACF

Let’s rely on our LAG=2 example for developing the PACF formula. Later, we’ll generalize it to LAG=k. To know how much of the variance in T_(i-2) has not been explained by the variance in T_(i-1) we do two things:

1. Step 1: We fit a linear regression model (i.e. a straight line) to the distribution of T_i versus T_(i-1). This linear model will let us predict T_i from T_(i-1). Conceptually, this linear model is allowing us to explain the variance in T_i as a function of the variance in T_(i-1). But like all optimally fitted models, our model is not going to be able to explain all of the variance in T_i. This fact takes us to step 2.
2. Step 2: In this step, we calculate the residual errors of the linear model that we built in Step 1. The residual error is the difference between the observed value of T_i and the value predicted by the model. We do this residue calculation for each value of T_i so as to get a time series of residuals. This residuals time series gives us what we are looking for. It gives us the amount of variance in T_i which cannot be explained by the variance in T_(i-1), plus of course some noise.

To calculate the second variable in the correlation, namely the amount of variance in T_(i-2) that cannot be explained by the variance in T_(i-1), we execute steps 1 and 2 above in the context of T_(i-2) and T_(i-1) instead of respectively T_i and T_(i-1). This gives us the residuals series we are seeking for variable 2.

The final step is to apply the formula for Pearson’s correlation coefficient to these two time series of residuals.

Here is the resulting formula for PACF(T_i, k=2):

T_i|T_(i-1) is the time series of residuals which we created from steps 1 and 2 after fitting a linear model to the distribution of T_i versus T_(i-1).

T_(i-2)|T_(i-1) is the second time series of residuals which we created from steps 1 and 2 after fitting a linear model to the distribution of T_(i-2) versus T_(i-1).

The numerator of the equation calculates the covariance between these two residual time series and the denominator standardizes the covariance using the respective standard deviations.

So there you have it. This is how we calculate the PACF for LAG=2.

## The general formula for PACF(X, lag=k)

In the general case, values older than one or two periods can also have a direct impact on the forecast for the current time period’s value. So one can write the generalized version of auto-regression equation for forecasting T_i as follows:

We can similarly generalize the argument that lead up to the development of the PACF formula for LAG=2. The formula for PACF at LAG=k is:

T_i|T_(i-1), T_(i-2)…T_(i-k+1) is the time series of residuals obtained from fitting a multivariate linear model to T_(i-1), T_(i-2)…T_(i-k+1) for predicting T_i. It represents the residual variance in T_i after stripping away the influence of T_(i-1), T_(i-2)…T_(i-k+1).

T_(i-k)|T_(i-1), T_(i-2)…T_(i-k+1) is the time series of residuals obtained from fitting a multivariate linear model to T_(i-1), T_(i-2)…T_(i-k+1) for predicting T(i-k). It represents the residual variance in T_(i-k) after stripping away the influence of T_(i-1), T_(i-2)…T_(i-k+1).

## Working through an example

Let’s put our money where our mouth is. We’ll hand crank out the PACF on a real world time series using the above steps. Of course in practice you don’t have to calculate PACF from first principles. But knowing how it can be done from scratch will give you a valuable insight into the machinery of PACF.

The real world time series we’ll use is the Southern Oscillations data set which can be used to predict an El Nino or La Nina event.

```import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt

```

Next we’ll add two columns to the data frame containing the LAG=1 and LAG=2 versions of the data.

```df['T_(i-1)'] = df['T_i'].shift(1)
df['T_(i-2)'] = df['T_i'].shift(2)

#drop the top two rows as they contain NaNs
df = df.drop(df.index[[0,1]])
```

Now let’s fit a linear regression model on T_i and T_(i-1) and add the model’s predictions back into the data frame as a new column.

```lm = linear_model.LinearRegression()

df_X = df[['T_(i-1)']] #Note the double brackets! [[]]

df_y = df['T_i'] #Note the single brackets! []

model = lm.fit(df_X,df_y)

df['Predicted_T_i|T_(i-1)'] = lm.predict(df_X)
```

Next let’s create the time series of residuals corresponding to the predictions of this model and add it to the data frame. This time series gives us the first one of the two data series we need for calculating the PACF for T_i at LAG=2.

```#Observed minus predicted
df['Residual_T_i|T_(i-1)'] = df['T_i'] - df['Predicted_T_i|T_(i-1)']
```

Let’s repeat the above procedure to calculate the second time series of residuals, this time using the columns: T_(i-2) and T_(i-1).

```lm = linear_model.LinearRegression()

df_X = df[['T_(i-1)']] #Note the double brackets! [[]]

df_y = df['T_(i-2)'] #Note the single brackets! []

model = lm.fit(df_X,df_y)

df['Predicted_T_(i-2)|T_(i-1)'] = lm.predict(df_X)

#Residual = Observed - predicted
df['Residual_T_(i-2)|T_(i-1)'] = df['T_(i-2)'] - df['Predicted_T_(i-2)|T_(i-1)']
```

Finally, let’s apply the formula for Pearson’s r to the two time series of residuals to get the value of the PACF at LAG=2.

```print(df.corr(method='pearson')['Residual_T_i|T_(i-1)']['Residual_T_(i-2)|T_(i-1)'])
#prints: 0.29612303554627606
```

As mentioned earlier, in practice we cheat! :=) Like so:

```from statsmodels.tsa.stattools import pacf

print(pacf(df['T_i'], nlags=2))
#prints: 0.2996545841351261
```

Here is the complete source code:

And here is the link to the curated version of the southern oscillations data set.

## How to use the PACF in time series forecasting

You can put PACF to very effective use for the following things:

1. To determine how many past lags to include in the forecasting equation of an auto-regressive model. This is known as the Auto-Regression (AR) order of the model.
2. To determine, or to validate, how many seasonal lags to include in the forecasting equation of a moving average based forecast model for a seasonal time series. This is known as the Seasonal Moving Average (SMA) order of the process.

Let’s see how to calculate these terms using PACF.

## Using PACF to determine the order of an AR process

Let’s plot the PACF for the Southern Oscillations data set for various lags: PACF plot for the Southern Oscillations data set (Image by Author)

This plot brings up the following points:

1. The PACF at LAG 0 is 1.0. This is always the case. A value is always 100% correlated with itself!
2. The PACF at LAG 1 is 0.62773724. This value is simply the regular auto-correlation between values at LAG 0 and LAG 1 values.
3. The PACF value at LAG 2 is 0.29965458 which is essentially the same as what we computed manually.
4. At LAG 3 the value is just outside the 95% confidence bands. It may or may not be significant.

Thus the Southern Oscillations data set has an AR(2), or possibly an AR(3) signature.

Here is the code snippet that produces the graph:

```import pandas as pd
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.stattools import pacf
import matplotlib.pyplot as plt

plot_pacf(df['T_i'], title='PACF: Southern Oscillations')

print(pacf(df['T_i']))
```

## Using PACF to determine the order of an SMA process

Consider the following plot of a seasonal time series. Average monthly max temperature in Boston,MA from 1998- 2019 (Image by Author)

It’s natural to expect January’s maximum from last year to be correlated with the January’s maximum in this year. So we will guess the seasonal period to be 12 months. With this assumption, let’s apply a single seasonal difference of 12 months to this time series i.e. we will derive a new time series where each data point is the difference of two data points in the original time series that are 12 periods apart. Here’s the seasonally differenced time series:

Next we calculate the PACF of this seasonally differenced time series. Here is the PACF plot: PACF values of the seasonally differenced time series at various lags (Image by Author)

The PACF plot shows a significant partial auto-correlation at 12, 24, 36, etc months thereby confirming our guess that the seasonal period is 12 months. Moreover the fact that these spikes are negative, points to an SMA(1) process. The ‘1’ in SMA(1) corresponds to a period of 12 in the original series. So if you were to construct an Seasonal ARIMA model for this time series, you would set the seasonal component of ARIMA to (0,1,1)12. The first ‘1’ corresponds to the single seasonal difference that we applied, and the second ‘1’ corresponds to the SMA(1) characteristic that we noticed.

Following is the code snippet to generate these plots:

```import pandas as pd
from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt

#plot it
df['Monthly Average Maximum'].plot()
plt.show()

#create the differenced column
df['T_i-T_(i-12)'] = df['Monthly Average Maximum'] - df['T_(i-12)']

#plot the differenced series
df['T_i-T_(i-12)'].plot()
plt.show()

#drop the first 12 rows as they contain NaNs in the differenced col
df = df[12:]

#plot the pacf of the differenced column
plot_pacf(df['T_i-T_(i-12)'], title='PACF: Seasonal Time Series')
plt.show()
```

And here is the link to the temperatures data set.

So there you have it. PACF is a powerful tool and it’s a must-have in a forecaster’s toolbox. Now that you know how it works and how to interpret the results be sure to use it, especially while building AR, MA, ARIMA and Seasonal ARIMA models.

Happy forecasting!

## Related

How To Isolate Trend, Seasonality And Noise From A Time Series

Introduction to Regression With ARIMA Errors Model