###### What is it, how is it calculated and when to use it

We’ll go over the concepts that drive the creation of the Partial Auto-Correlation Function (PACF) and we’ll see how these concepts lead to the development of the definition of partial auto-correlation and the formula for PACF.

I will demonstrate from first principles how the PACF can be calculated and we’ll compare the result with the value returned by *statsmodels.tsa.stattools.pacf()*.

We’ll finish by seeing how to use PACF in time series forecasting.

## Laying the foundation for PACF: The auto-regressive time series

There are many phenomena in which the past influences the present. The events of yesterday can be used to foretell what will happen today. When such phenomena are represented as a time series, they are said to have an auto-regressive property. In an auto regressive time series, the current value can be expressed as a function of the previous value, the value before that one and so forth. In other words, the current value is correlated with previous values from the same time series.

If a time series is auto-regressive it is often the case that the current value’s forecast can be computed as a linear function of **only** the previous value and a constant, as follows:

Here *T_i* is the value that is forecast by the equation at the *ith* time step. *Beta0* is the Y-intercept of the model and it applies a constant amount of bias to the forecast. It also specifies what will be the forecast for *T_i* if the value at the previous time step *T_(i-1)* happens to be zero. *Beta1* tells us the rate at which *T_i* changes w.r.t. *T_(i-1)*.

The key assumption behind this simple equation is that the variance in *T_(i-1)* is able to explain all the variance expressed by all values that are older than *T_(i-1)* in the time series. It is as if *T_(i-1)* captures all the information associated with values older than itself.

But what if this assumption were not true? What if the variance in *T_(i-1)* is not able to explain all of the variance contained within *T_(i-2)*? In that case, the above equation will not be able to feed this unexplained portion of the variance from *T_(i-2)* into *T_i*, causing the forecast for *T_i* to go off the mark.

Fortunately it’s easy to fix this problem adding a term to the above equation as follows:

In this equation the extra term *Beta2*T_(i-2)* seeks to capture the variance contained in values that are older than *T_(i-1)* *that could not be explained by the variance in T_(i-1)*. It feeds this balance amount of information *directly* into the forecast for today’s value *T_i*.

With the background established let’s build the definition and the formula for the partial auto-correlation function.

## Building the definition of PACF

Let’s reproduce the above equation for reference:

It would be useful to know just how important the balance amount of variance in *T_(i-2)* is in predicting today’s value *T_i*. Why? Because it tells us if we need to add *T_(i-2)* as a variable in our forecast model for *T_i*. If the balance variance in *T_(i-2)* is not statistically significant, we can safely assume that all the variance in values that are older than *T_(i-2)* are either not significant for forecasting today’s value, or their significance is already captured in *T_(i-1)*. Either way, it gives us the reason to fall back to our earlier simpler equation that contained only *T_(i-1)*.

So how do we find out how important this balance amount of variance in *T_(i-2)* is in predicting today’s value *T_i*? Easy, we calculate the correlation coefficient between the two. Wait, but isn’t *T_i* also correlated with *T_(i-1)*? After all that is the whole basis for the above two equations! Of course it is. So what we actually want to find out is the correlation between the following two variables:

** Variable I:** The amount of variance in

*T_i*that is

**not explained**by the variance in

*T_(i-1)*, AND

** Variable II: **The amount of variance in T_(i-2) that is

**not explained**by the variance in

*T_(i-1)*.

This correlation is called the **partial auto-correlation** of *T_i* with *T_(i-2)*.

The definition of Variable II seems counter-intuitive. How can yesterday’s value explain day-before-yesterday’s value? To understand this, recollect that in an auto-regressive time series, some of the information from day-before-yesterday’s value **is carried forward** into yesterday’s value. This fact— in a strange sounding way — makes yesterday’s value a predictor for day-before-yesterday’s value!

Generalizing the above above argument leads to the following definition for the PACF:

The partial auto-correlation of T_i with a k lagged version of itself i.e. T_(i-k) is a correlation between the following two variables:

Variable 1:The amount of variance in T_i that is not explained by the variance in T_(i-1), T_(i-2)…T_(i-k+1), and,

Variable 2:The amount of variance in T_(i-k) that is not explained by the variance in T_(i-1), T_(i-2)…T_(i-k+1).

## Developing the formula for PACF

Let’s rely on our LAG=2 example for developing the PACF formula. Later, we’ll generalize it to LAG=k. To know how much of the variance in *T_(i-2)* has not been explained by the variance in *T_(i-1)* we do two things:

**Step 1:**We fit a linear regression model (i.e. a straight line) to the distribution of*T_i*versus*T_(i-1)*. This linear model will let us predict*T_i*from*T_(i-1)*. Conceptually, this linear model is allowing us to explain the variance in*T_i*as a function of the variance in*T_(i-1)*. But like all optimally fitted models, our model is not going to be able to explain all of the variance in*T_i*. This fact takes us to step 2.**Step 2:**In this step, we calculate the residual errors of the linear model that we built in Step 1. The residual error is the difference between the observed value of*T_i*and the value predicted by the model. We do this residue calculation for each value of*T_i*so as to get a time series of residuals. This residuals time series gives us what we are looking for. It gives us the amount of variance in*T_i*which cannot be explained by the variance in*T_(i-1)*, plus of course some noise.

To calculate the second variable in the correlation, namely the amount of variance in *T_(i-2)* that cannot be explained by the variance in *T_(i-1)*, we execute steps 1 and 2 above in the context of *T_(i-2)* and *T_(i-1)* instead of respectively *T_i* and *T_(i-1)*. This gives us the residuals series we are seeking for variable 2.

The final step is to apply the formula for Pearson’s correlation coefficient to these two time series of residuals.

Here is the resulting formula for PACF(T_i, k=2):

*T_i|T_(i-1)* is the time series of residuals which we created from steps 1 and 2 after fitting a linear model to the distribution of *T_i* versus *T_(i-1)*.

*T_(i-2)|T_(i-1)* is the second time series of residuals which we created from steps 1 and 2 after fitting a linear model to the distribution of *T_(i-2)* versus *T_(i-1)*.

The numerator of the equation calculates the **covariance between these two residual time series** and the **denominator standardizes the covariance** using the respective standard deviations.

So there you have it. This is how we calculate the PACF for LAG=2.

## The general formula for PACF(X, lag=k)

In the general case, values older than one or two periods can also have a direct impact on the forecast for the current time period’s value. So one can write the generalized version of auto-regression equation for forecasting T_i as follows:

We can similarly generalize the argument that lead up to the development of the PACF formula for LAG=2. The formula for PACF at LAG=*k *is:

** T_i|T_(i-1), T_(i-2)…T_(i-k+1)** is the time series of residuals obtained from fitting a multivariate linear model to

*T_(i-1), T_(i-2)…T_(i-k+1)*for predicting

*T_i*. It represents the residual variance in

*T_i*after stripping away the influence of

*T_(i-1), T_(i-2)…T_(i-k+1).*

** T_(i-k)|T_(i-1), T_(i-2)…T_(i-k+1)** is the time series of residuals obtained from fitting a multivariate linear model to

*T_(i-1), T_(i-2)…T_(i-k+1)*for predicting

*T(i-k)*. It represents the residual variance in

*T_(i-*k) after stripping away the influence of

*T_(i-1), T_(i-2)…T_(i-k+1).*

## Working through an example

Let’s put our money where our mouth is. We’ll *hand crank* out the PACF on a real world time series using the above steps. Of course in practice you don’t have to calculate PACF from first principles. But knowing how it can be done from scratch will give you a valuable insight into the machinery of PACF.

The real world time series we’ll use is the Southern Oscillations data set which can be used to predict an El Nino or La Nina event.

We’ll start with setting up the imports, and reading the data into a pandas DataFrame.

```
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
df = pd.read_csv('southern_osc.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
```

Next we’ll add two columns to the data frame containing the LAG=1 and LAG=2 versions of the data.

```
df['T_(i-1)'] = df['T_i'].shift(1)
df['T_(i-2)'] = df['T_i'].shift(2)
#drop the top two rows as they contain NaNs
df = df.drop(df.index[[0,1]])
```

Now let’s fit a linear regression model on *T_i* and *T_(i-1)* and add the model’s predictions back into the data frame as a new column.

```
lm = linear_model.LinearRegression()
df_X = df[['T_(i-1)']] #Note the double brackets! [[]]
df_y = df['T_i'] #Note the single brackets! []
model = lm.fit(df_X,df_y)
df['Predicted_T_i|T_(i-1)'] = lm.predict(df_X)
```

Next let’s create the time series of residuals corresponding to the predictions of this model and add it to the data frame. This time series gives us the first one of the two data series we need for calculating the PACF for *T_i* at LAG=2.

```
#Observed minus predicted
df['Residual_T_i|T_(i-1)'] = df['T_i'] - df['Predicted_T_i|T_(i-1)']
```

Let’s repeat the above procedure to calculate the second time series of residuals, this time using the columns: *T_(i-2)* and *T_(i-1)*.

```
lm = linear_model.LinearRegression()
df_X = df[['T_(i-1)']] #Note the double brackets! [[]]
df_y = df['T_(i-2)'] #Note the single brackets! []
model = lm.fit(df_X,df_y)
df['Predicted_T_(i-2)|T_(i-1)'] = lm.predict(df_X)
#Residual = Observed - predicted
df['Residual_T_(i-2)|T_(i-1)'] = df['T_(i-2)'] - df['Predicted_T_(i-2)|T_(i-1)']
```

Finally, let’s apply the formula for Pearson’s r to the two time series of residuals to get the value of the PACF at LAG=2.

```
print(df.corr(method='pearson')['Residual_T_i|T_(i-1)']['Residual_T_(i-2)|T_(i-1)'])
#prints: 0.29612303554627606
```

As mentioned earlier, in practice we cheat! :=) Like so:

```
from statsmodels.tsa.stattools import pacf
print(pacf(df['T_i'], nlags=2)[2])
#prints: 0.2996545841351261
```

Here is the complete source code:

And here is the link to the curated version of the southern oscillations data set.

## How to use the PACF in time series forecasting

You can put PACF to very effective use for the following things:

- To determine how many past lags to include in the forecasting equation of an auto-regressive model. This is known as the Auto-Regression (AR) order of the model.
- To determine, or to validate, how many seasonal lags to include in the forecasting equation of a moving average based forecast model for a seasonal time series. This is known as the Seasonal Moving Average (SMA) order of the process.

Let’s see how to calculate these terms using PACF.

## Using PACF to determine the order of an AR process

Let’s plot the PACF for the Southern Oscillations data set for various lags:

This plot brings up the following points:

- The PACF at LAG 0 is 1.0. This is always the case. A value is always 100% correlated with itself!
- The PACF at LAG 1 is 0.62773724. This value is simply the regular auto-correlation between values at LAG 0 and LAG 1 values.
- The PACF value at LAG 2 is 0.29965458 which is essentially the same as what we computed manually.
- At LAG 3 the value is just outside the 95% confidence bands. It may or may not be significant.

Thus the Southern Oscillations data set has an AR(2), or possibly an AR(3) signature.

Here is the code snippet that produces the graph:

```
import pandas as pd
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.stattools import pacf
import matplotlib.pyplot as plt
df = pd.read_csv('southern_osc.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
plot_pacf(df['T_i'], title='PACF: Southern Oscillations')
print(pacf(df['T_i']))
```

## Using PACF to determine the order of an SMA process

Consider the following plot of a seasonal time series.

It’s natural to expect January’s maximum from last year to be correlated with the January’s maximum in this year. So we will guess the seasonal period to be 12 months. With this assumption, let’s apply a single seasonal difference of 12 months to this time series i.e. we will derive a new time series where each data point is the difference of two data points in the original time series that are 12 periods apart. Here’s the seasonally differenced time series:

Next we calculate the PACF of this seasonally differenced time series. Here is the PACF plot:

The PACF plot shows a significant partial auto-correlation at 12, 24, 36, etc months thereby confirming our guess that the seasonal period is 12 months. Moreover the fact that these spikes are negative, points to an SMA(1) process. The ‘1’ in SMA(1) corresponds to a period of 12 in the original series. So if you were to construct an Seasonal ARIMA model for this time series, you would set the seasonal component of ARIMA to (0,1,1)12. The first ‘1’ corresponds to the single seasonal difference that we applied, and the second ‘1’ corresponds to the SMA(1) characteristic that we noticed.

Following is the code snippet to generate these plots:

```
import pandas as pd
from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt
#load the data set
df = pd.read_csv('data\\boston_daily_wx_1998_2019.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
#plot it
df['Monthly Average Maximum'].plot()
plt.show()
#create the differenced column
df['T_i-T_(i-12)'] = df['Monthly Average Maximum'] - df['T_(i-12)']
#plot the differenced series
df['T_i-T_(i-12)'].plot()
plt.show()
#drop the first 12 rows as they contain NaNs in the differenced col
df = df[12:]
#plot the pacf of the differenced column
plot_pacf(df['T_i-T_(i-12)'], title='PACF: Seasonal Time Series')
plt.show()
```

And here is the link to the temperatures data set.

So there you have it. PACF is a powerful tool and it’s a must-have in a forecaster’s toolbox. Now that you know how it works and how to interpret the results be sure to use it, especially while building AR, MA, ARIMA and Seasonal ARIMA models.

Happy forecasting!

## Related

The Intuition Behind Correlation

How To Isolate Trend, Seasonality And Noise From A Time Series

Introduction to Regression With ARIMA Errors Model

## Citations and Copyrights

### Data set

**Southern Oscillation Index (SOI)** data is downloaded from United States National Weather Service’s Climate Prediction Center’s Weather Indices page. **Download link for curated data set**.

**Average monthly maximum temperatures** recorded in Boston, Massachusetts. Data is taken from National Centers for Environmental Information. **Download link for the curated data set**

### Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

**PREVIOUS: **The Intuition Behind Correlation

**NEXT: **The Three Conditionals: Conditional Probability, Conditional Expectation And Conditional Variance