###### The MSDR model explained in detail and a Python tutorial to get you up and running on the MSDR using a real world data set

The **Markov Switching Dynamic Regression** model is a type of **Hidden Markov Model** that can be used to represent phenomena in which some portion of the phenomenon is directly observed while the rest of it is ‘hidden’. The hidden part is modeled using a **Markov model**, while the visible portion is modeled using a suitable **time series regression model** in such a way that, the mean and variance of the time series regression model change depending on which state the hidden Markov model is in.

The first half of this article explains in detail how the MSDR model works, while the second half is **a fast-paced tutorial** on how to build and train an MSDR model using **Python **and **statsmodels** on a **real world data set.**

If you are new to Markov processes please review the following article:

**A Beginner’s Guide to Discrete Time Markov Chains**

For a detailed treatment of **Hidden Markov Models**, including the MSDR model, please review the following:

## A general specification of the MSDR model

We will look at a general specification of the MSDR model consisting of a time indexed dependent variable ** y**, a matrix of regression variables

**, a**

*X**fitted*vector of coefficients

**and residual errors**

*β_cap*

*ε**.*

Bolded variables will indicate vectors or matrices. Variables with a ‘cap’ on them will mean the fitted value of that variable, i.e. the value obtained as a result of training the model on a data set.

### Dependent variable *y*

Let *y** *be a *[n X 1]* vector of ‘*n*’ time-indexed observations *y_t:*

### Regression variables matrix X

Let ** X** be an

*[n X (m+1)]*matrix of regression variables. The first column is for the intercept.

*x**_t*is one row of this matrix.

### Fitted Regression coefficients *β_cap*

*β_cap*

Let ** β_cap** be an

*[(m+1) X 1]*vector of regression coefficients.

*The ‘cap’ on*

*β*indicates that it is the fitted value of the coefficient produced from training the model.

We now consider the following general specification of a time series model containing an additive error component:

In the above model, the observed value *y_t* is the sum of the predicted value *μ_cap_t* and the residual error *ε_t*. We further assume that *ε_t* is a homoskedastic (constant variance) normally distributed random variable with a zero mean, namely, *N(0, σ²)*.

*μ_cap_t *is the output of some regression function *η(.)* such that:

*μ_cap_t = η(**x**_t, **β_cap**)*

*η(.)=0*yields the**white noise model****:***y_t = ε_t.**η(.)=y_bar=(y_1+y2+…+y_n)/n*yields the**mean model**:*y_t = y_bar.**η(.)=**x**_t*·yields the*β_cap***linear model**: y_t =*x**_t*·*β_cap**+ ε_t**η(.)=exp(**x**_t*·*β_cap**)*yields the**Poisson**model or**nonlinear**Poisson-like models.

and so on.

## Mixing-in the Markov model

Let us introduce the impact of a k-state Markov process on the above specified regression model, as follows:

Where, as before, the predicted mean is a function of *x**_t* and ** β_cap**:

*μ_cap_t_j = η(**x**_t, **β_cap**_j)*

But this time around, notice that the regression coefficients vector is called *β_cap**_j *corresponding to the *jth *Markov state.

In general, if the Markov model operates over ‘*k’ *states *[1,2,…,j,…,k]*, the regression coefficients is a matrix of size *[(m+1) X k] *as follows:

The intuition here is that depending on which Markov state or ‘regime’ *j* *in [1, 2,…,k*] is currently in effect, the regression model coefficients will **switch** to the appropriate regime-specific *vector **β_cap**_j *from ** β_cap_s**. Hence the name ‘Markov

**Switching**Dynamic Regression model’.

The *k*-state Markov process itself is governed by the following state transition matrix ** P**:

And it has the following state probability distribution *π_**t *at time step *t*:

And knowing/assuming some value for ** π_0** at

*t=0*, we calculate

*π_**t*as follows:

## An MSDR model with a linear link function

Now suppose we assume a **linear link function** for the predicted mean:

*η(.)=**x**_t*·** β_cap_s** which yields the

**linear model**

**for the visible process:**

*y_t = **x**_t·**β_cap_s** + ε_t*

Recollect that *x**_t* is a matrix of size *[1 X (m+1)], and *** β_cap_s** is a matrix of size

*[(m+1) X k],*and therefore,

*x**_t*·

**yields a matrix of size**

*β_cap_s**[1 X k]*as follows

*:*

Thus, for each time step, the above calculation returns *k* possible predictions for the mean, corresponding to the *k*-states of the underlying Markov process. We need a way to collapse them into a single prediction. We do that by using calculating the expected value of *μ_cap_t_j*:

The probability of the Markov model being in state *j *at time *t *is given by the *jth *element of the state probability distribution of the Markov variable *s_t*. We saw earlier that it is the following vector:

## Training and estimation

Training of the MSDR model involves estimating the coefficients matrix ** β_cap_s**, the transition matrix

**and the variance**

*P**σ²*of the dependent variable

**The estimation procedure is usually Maximum Likelihood Estimation (MLE) or Expectation Maximization.**

*y.*We’ll illustrate MLE which finds the values of *P**, *** β_cap_s **and σ² that would

*maximize the joint probability density of observing the entire training data set*

**. In other words, we would want to maximize the following product:**

*y*In the above product, the probability density *f(**y**=y_t)* whose form depends on what sort of a model we assume for *y**.*

If we assume a linear model like the one we saw earlier, the probability density *f(**y**=y_t)* is given by the following equation:

Where, *μ_cap_t* is the expected value of the predicted mean across all possible regimes as calculated using Equation (1). The probability on the L.H.S. is read as the conditional probability density of observing *y_t *at time *t*, given the regression variable values *x**_t*, and the regime specific coefficients matrix *β_cap_s.*

There is another way to calculate the probability density. Let’s rewrite the formula for* f(·)* as follows:

Where, *μ_cap_tj = **x**_t*·*β_cap**_j. *i.e., it is a **linear combination** of the regression variables and the regression coefficients pertaining to regime *j:*

For a *k-*state Markov process, we would get *k *such probability density values for each time step *t*, one for each state *j* in *[1,2,3,…,k]*.

As before, for each time step, we’d want to collapse these *k* probability densities into a single density. This time, we appeal to the **Law of Total Probability** which states that if event A can takes place jointly with either event A1, or event A2, or event A3, and so on, then the unconditional probability of A can be expressed as follows:

Using this law, we get the probability density of observing *y_t* at time *t* as follows:

Notice that in each one of the k terms, we are using the corresponding conditional mean *μ_cap_tj. *And we know that* P(s_t=1), P(s_t=2),…etc.* are the Markov process’s state probabilities at time *t.*

Now that we have the formulation for *f(**y**=y_t)* in place, let’s circle back to the Likelihood equation:

It is often expeditious to maximize the **natural logarithm of this product** since it has the benefit of converting products into summations which are easier to work with in Calculus (we’ll soon see why). Hence, we maximize the following ** log**-likelihood:

Maximization of Log-Likelihood is done by using the following procedure:

- Take partial derivatives of the log-likelihood w.r.t. each transition
*probability p_ij*in, and each state specific coefficient*P**β_cap_q_j*in the coefficients matrix*β_cap_s**q in [0,1,…,m]*and Markov state*j in [1,2,…,k]*, and w.r.t. variance σ², - Set each partial derivative to zero, and,
- Solve the resulting system of
*(k² +(m+1)*k +1)*equations (in practice, much fewer than that number) corresponding to*k*² Markov transition probabilities,*(m+1)*k*coefficients in, and the variance σ² using some optimization algorithm such as Newton-Raphson, Nelder-Mead or Powell’s.*β_cap_s*

## Tutorial on building an MSDR model using Python and Statsmodels

In this tutorial, we’ll use the MSDR model to investigate a possible link between Personal Consumption Expenditures in the United States and the Consumer Sentiment index published by the University Michigan. As we’ll soon see, the Hidden Markov portion of the MSDR model will help us to explain some of the variability in the Personal Consumption Expenditures data that an otherwise straight-up regression with the Consumer Sentiment data may not be able to accomplish.

Let’s start by inspecting the two datasets: Personal Consumption Expenditures, and Consumer Sentiment Index. Both datasets are available for download from here. Using Pandas, we’ll load both datasets into a Dataframe:

```
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm
```

```
#Load the PCE and UMCSENT datasets
df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0, infer_datetime_format=True, parse_dates=['DATE'])
#Set the index frequency to 'Month-Start'
df = df.asfreq('MS')
```

Let’s plot both time series:

```
fig = plt.figure()
fig.suptitle('% Chg in Personal Consumption Expenditure')
df['PCE_CHG'].plot()
plt.show()
fig = plt.figure()
fig.suptitle('% Chg in U. Michigan Consumer Sentiment Index')
df['UMCSENT_CHG'].plot()
plt.show()
```

We get the following plots:

## Regression goal and regression strategy

Our regression goal is to explain the variance in Personal Consumption Expenditures using the variance in the U. Michigan Consumer Sentiment Index.

Thus, our regression model’s variables are as follows:

**Dependent variable** ** y** = PCE_CHG (M-o-M % Change in Personal Consumption Expenditures)

**Regression variable** ** X**=UMCSENT_CHG (M-o-M % Change in Consumer Sentiment Index), plus the intercept of regression.

Here are our ** y** and

**matrices:**

*X*The intercept of regression is added by statsmodels.

In both charts, one can see regions of high volatility (a.k.a. variance), as illustrated with red circles below on the PCE chart :

We will use a 2-state Markov Switching Dynamic Regression model to try to model these ‘switches’ in variance between high and low variance regimes:

Notice that this time we are using 0 based indexes for Markov states only because that’s what statsmodels uses.

The state transition matrix is:

Our MSDR model’s equation is as follows:

Notice that we have also introduced a **state-specific variance**. We are saying that the residual errors of the model are normally distributed around a zero mean and a variance that switches between two values depending on which state the underlying Markov process is in.

statsmodels supports a variance-switching MSDR model.

Let’s build and train the 2-state MSDR model:

```
msdr_model = sm.tsa.MarkovRegression(endog=df['PCE_CHG'], k_regimes=2, trend='c', exog=df['UMCSENT_CHG'], switching_variance=True)
msdr_model_results = msdr_model.fit(iter=1000)
```

The *trend=’c’ *tells statsmodels to add an intercept to the ** X** matrix.

Let’s print the model training summary:

```
print(msdr_model_results.summary())
```

We get the following output:

The variable names in the above results can be interpreted as follows:*β_0=constβ_1=x1σ²=sigma2p[0->0]=p_00, and therefore, p_01=1 — p_00p[1->0]=p_10, and therefore, p_11–1-p_10*

By looking at the fitted coefficients’ values in the results, we can write the regime-specific model equations as follows:

**For regime=0 (low-variance regime):**

*PCE_CHG = 0.0048 + 0.0026*UMCSENT + ε_t,*

where ε_t ~ N(0, 1.701e-05)

where ε_t ~ N(0, 1.701e-05)

**For regime=1 (high-variance regime):**

*PCE_CHG = 0.0041 + 0.1203*UMCSENT + ε_t,*

where ε_t ~ N(0, 0.0012)

where ε_t ~ N(0, 0.0012)

The Markov state transition diagram and matrix ** P** are as follows:

We see that when Personal Consumption Expenditures are in a low variance regime, they tend to switch to a high variance regime less than 2% of the time, while if the consumption expenditure is in a high variance regime, it tends to switch back to a low variance regime with roughly 20% probability.

A somewhat troubling fact that comes to light in the training results is that in regime 0, the coefficient of UMCSENT (*β_10)* is not statistically significant at a *95%* confidence level *(p-value=0.513 > 0.05)*, and in regime *1*, both *β_01* and *β_11* are not statistically significant at 95% confidence level*.*

Let’s plot the Markov state probabilities *π_**t *at each time step *t. *In our example, the time step is one month. Alongside the chart of *π_**t*, we’ll additionally plot the PCE % change chart, and we’ll also plot the **dates of U.S. recessions as inferred by the GDP-based recession indicator **data set (sourced from US FRED under public domain license).

```
df_r = pd.read_csv('JHDUSRGDPBR.csv', header=0, index_col=0, infer_datetime_format=True, parse_dates=['DATE'])
figure, axes = plt.subplots(3)
ax = axes[0]
ax.plot(df.index, df['PCE_CHG'])
ax.set(title='% Chg in Personal Consumption Expenditure')
ax = axes[1]
ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[0])
ax.set(title='Smoothed probability of regime 0')
ax = axes[2]
ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[1])
ax.plot(df_r.index, df_r['JHDUSRGDPBR'])
ax.set(title='Smoothed probability of regime 1 super-imposed on GDP based recession indicator (Orange)')
plt.show()
```

We get the following plot

We see that often when the Markov state model is in the high variance state, the US economy is in a recession.

Here’s the complete source code:

## References and Copyrights

### Data sets

U.S. Bureau of Economic Analysis, Personal Consumption Expenditures [PCE], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PCE, November 11, 2021. Available under Public license.

University of Michigan, Survey Research Center, Surveys of Consumers. The Index of Consumer Sentiment. Available under public license.

Hamilton, James, Dates of U.S. recessions as inferred by GDP-based recession indicator [JHDUSRGDPBR], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/JHDUSRGDPBR, November 12, 2021.

### Books

James D. Hamilton, *Time Series Analysis*, Princeton University Press, 2020. ISBN: 0691218633

### Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

**PREVIOUS: **Introduction to the Regression with ARIMA Errors Model

**NEXT: **The Auto-Regressive Poisson Model