The Markov Switching Dynamic Regression Model - Statistical Modeling and Forecasting

The MSDR model explained in detail and a Python tutorial to get you up and running on the MSDR using a real world data set

The Markov Switching Dynamic Regression model is a type of Hidden Markov Model that can be used to represent phenomena in which some portion of the phenomenon is directly observed while the rest of it is ‘hidden’. The hidden part is modeled using a Markov model, while the visible portion is modeled using a suitable time series regression model in such a way that, the mean and variance of the time series regression model change depending on which state the hidden Markov model is in.

The first half of this article explains in detail how the MSDR model works, while the second half is a fast-paced tutorial on how to build and train an MSDR model using Python and statsmodels on a real world data set.

If you are new to Markov processes please review the following article:

A Beginner’s Guide to Discrete Time Markov Chains

For a detailed treatment of Hidden Markov Models, including the MSDR model, please review the following:

Hidden Markov Models

A general specification of the MSDR model

We will look at a general specification of the MSDR model consisting of a time indexed dependent variable y, a matrix of regression variables X, a fitted vector of coefficients β_cap and residual errors ε.

Bolded variables will indicate vectors or matrices. Variables with a ‘cap’ on them will mean the fitted value of that variable, i.e. the value obtained as a result of training the model on a data set.

Dependent variable y

Let y be a [n X 1] vector of ‘n’ time-indexed observations y_t:

Regression variables matrix X

Let X be an [n X (m+1)] matrix of regression variables. The first column is for the intercept. x_t is one row of this matrix.

Fitted Regression coefficients β_cap

Let β_cap be an [(m+1) X 1] vector of regression coefficients. The ‘cap’ on β indicates that it is the fitted value of the coefficient produced from training the model.

We now consider the following general specification of a time series model containing an additive error component:

y_t expressed as the sum of the modeled mean and an error term — y_t expressed as the sum of a mean and an error term (Image by Author)

In the above model, the observed value y_t is the sum of the predicted value μ_cap_t and the residual error ε_t. We further assume that ε_t is a homoskedastic (constant variance) normally distributed random variable with a zero mean, namely, N(0, σ²).

μ_cap_t is the output of some regression function η(.) such that:

μ_cap_t = η(x_t, β_cap)

η(.)=0 yields the white noise model: y_t = ε_t.
η(.)=y_bar=(y_1+y2+…+y_n)/n yields the mean model: y_t = y_bar.
η(.)=x_t·β_cap yields the linear model: y_t = x_t·β_cap + ε_t
η(.)=exp(x_t·β_cap) yields the Poisson model or nonlinear Poisson-like models.

and so on.

Mixing-in the Markov model

Let us introduce the impact of a k-state Markov process on the above specified regression model, as follows:

The observed value y_t expressed as a sum of the predicted mean μ_cap_s_t and residual error ε_t — The observed value y_t expressed as a sum of the predicted mean *μ_cap_t_j and residual error* ε_t (Image by Author)

Where, as before, the predicted mean is a function of x_t and β_cap:

μ_cap_t_j = η(x_t, β_cap_j)

But this time around, notice that the regression coefficients vector is called β_cap_j corresponding to the jth Markov state.

In general, if the Markov model operates over ‘k’ states [1,2,…,j,…,k], the regression coefficients is a matrix of size [(m+1) X k] as follows:

The coefficients matrix of size [k x (m+1)] (Image by Author)

The intuition here is that depending on which Markov state or ‘regime’ j in [1, 2,…,k] is currently in effect, the regression model coefficients will switch to the appropriate regime-specific vector β_cap_j from β_cap_s. Hence the name ‘Markov Switching Dynamic Regression model’.

The k-state Markov process itself is governed by the following state transition matrix P:

The state transition matrix P of the Markov process (Image by Author)

And it has the following state probability distribution π_t at time step t:

The state probability distribution vector of the k-state Markov process (Image by Author)

And knowing/assuming some value for π_0 at t=0, we calculate π_t as follows:

The formula for the state probability distribution of a Markov process at time t, given the probability distribution at t=0 and the transition matrix P (Image by Author)

An MSDR model with a linear link function

Now suppose we assume a linear link function for the predicted mean:

η(.)=x_t·β_cap_s which yields the linear model for the visible process:

y_t = x_t·β_cap_s + ε_t

Recollect that x_t is a matrix of size [1 X (m+1)], and β_cap_s is a matrix of size [(m+1) X k], and therefore, x_t·β_cap_s yields a matrix of size [1 X k] as follows:

Thus, for each time step, the above calculation returns k possible predictions for the mean, corresponding to the k-states of the underlying Markov process. We need a way to collapse them into a single prediction. We do that by using calculating the expected value of μ_cap_t_j:

Expected value of the predicted mean (Image by Author)

The probability of the Markov model being in state j at time t is given by the jth element of the state probability distribution of the Markov variable s_t. We saw earlier that it is the following vector:

Training and estimation

Training of the MSDR model involves estimating the coefficients matrix β_cap_s, the transition matrix P and the variance σ² of the dependent variable y. The estimation procedure is usually Maximum Likelihood Estimation (MLE) or Expectation Maximization.

We’ll illustrate MLE which finds the values of P, β_cap_s and σ² that would maximize the joint probability density of observing the entire training data set y. In other words, we would want to maximize the following product:

The Likelihood of observing the data set (Image by Author)

In the above product, the probability density f(y=y_t) whose form depends on what sort of a model we assume for y.

If we assume a linear model like the one we saw earlier, the probability density f(y=y_t) is given by the following equation:

A normally distributed y under the influence of a k-state Hidden Markov Model (Image by Author)

Where, μ_cap_t is the expected value of the predicted mean across all possible regimes as calculated using Equation (1). The probability on the L.H.S. is read as the conditional probability density of observing y_t at time t, given the regression variable values x_t, and the regime specific coefficients matrix β_cap_s.

There is another way to calculate the probability density. Let’s rewrite the formula for f(·) as follows:

Probability density of y conditional upon X and on the Hidden Markov Model being in state j at time t — Probability density of y conditional upon X and on the Hidden Markov process being in state j at time t (Image by Author)

Where, μ_cap_tj = x_t·β_cap_j. i.e., it is a linear combination of the regression variables and the regression coefficients pertaining to regime j:

The conditional mean of a normally distributed y_t expressed as a linear combination of regression variables x_t, lagged versions of y_t, and Markov state dependent fitted coefficients β_cap_j — The conditional mean of a normally distributed y_t expressed as a linear combination of regression variables x_t, lagged versions of y_t, and Markov state dependent fitted coefficients ***β_cap***_j (Image by Author)

For a k-state Markov process, we would get k such probability density values for each time step t, one for each state j in [1,2,3,…,k].

As before, for each time step, we’d want to collapse these k probability densities into a single density. This time, we appeal to the Law of Total Probability which states that if event A can takes place jointly with either event A1, or event A2, or event A3, and so on, then the unconditional probability of A can be expressed as follows:

The Law of Total Probability (Image by Author)

Using this law, we get the probability density of observing y_t at time t as follows:

The probability density of y under the influence of a k-state Markov model (Image by Author)

Notice that in each one of the k terms, we are using the corresponding conditional mean μ_cap_tj. And we know that P(s_t=1), P(s_t=2),…etc. are the Markov process’s state probabilities at time t.

Now that we have the formulation for f(y=y_t) in place, let’s circle back to the Likelihood equation:

It is often expeditious to maximize the natural logarithm of this product since it has the benefit of converting products into summations which are easier to work with in Calculus (we’ll soon see why). Hence, we maximize the following log-likelihood:

The Log-Likelihood of observing the data set (Image by Author)

Maximization of Log-Likelihood is done by using the following procedure:

Take partial derivatives of the log-likelihood w.r.t. each transition probability p_ij in P, and each state specific coefficient β_cap_q_j in the coefficients matrix β_cap_s where q in [0,1,…,m] and Markov state j in [1,2,…,k], and w.r.t. variance σ²,
Set each partial derivative to zero, and,
Solve the resulting system of (k² +(m+1)*k +1) equations (in practice, much fewer than that number) corresponding to k² Markov transition probabilities, (m+1)*k coefficients in β_cap_s, and the variance σ² using some optimization algorithm such as Newton-Raphson, Nelder-Mead or Powell’s.

Tutorial on building an MSDR model using Python and Statsmodels

In this tutorial, we’ll use the MSDR model to investigate a possible link between Personal Consumption Expenditures in the United States and the Consumer Sentiment index published by the University Michigan. As we’ll soon see, the Hidden Markov portion of the MSDR model will help us to explain some of the variability in the Personal Consumption Expenditures data that an otherwise straight-up regression with the Consumer Sentiment data may not be able to accomplish.

Let’s start by inspecting the two datasets: Personal Consumption Expenditures, and Consumer Sentiment Index. Both datasets are available for download from here. Using Pandas, we’ll load both datasets into a Dataframe:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm

#Load the PCE and UMCSENT datasets
df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0, infer_datetime_format=True, parse_dates=['DATE'])

#Set the index frequency to 'Month-Start'
df = df.asfreq('MS')

Let’s plot both time series:

fig = plt.figure()
fig.suptitle('% Chg in Personal Consumption Expenditure')
df['PCE_CHG'].plot()
plt.show()

fig = plt.figure()
fig.suptitle('% Chg in U. Michigan Consumer Sentiment Index')
df['UMCSENT_CHG'].plot()
plt.show()

We get the following plots:

Percentage change in Personal Consumption Expenditures (data source US FRED under public domain license) (Image by Author)

Percentage change in Consumer Sentiment Index (data source: U. of Michigan under public domain license) (Image by Author)

Regression goal and regression strategy

Our regression goal is to explain the variance in Personal Consumption Expenditures using the variance in the U. Michigan Consumer Sentiment Index.

Thus, our regression model’s variables are as follows:

Dependent variable y = PCE_CHG (M-o-M % Change in Personal Consumption Expenditures)

Regression variable X=UMCSENT_CHG (M-o-M % Change in Consumer Sentiment Index), plus the intercept of regression.

Here are our y and X matrices:

The intercept of regression is added by statsmodels.

In both charts, one can see regions of high volatility (a.k.a. variance), as illustrated with red circles below on the PCE chart :

Regions of high volatility in Personal Consumption Expenditures (Image by Author)

We will use a 2-state Markov Switching Dynamic Regression model to try to model these ‘switches’ in variance between high and low variance regimes:

A 2-state MSDR model that switches between low and high variance states (Image by Author)

Notice that this time we are using 0 based indexes for Markov states only because that’s what statsmodels uses.

The state transition matrix is:

Our MSDR model’s equation is as follows:

Equations of a 2-state MSDR model with state dependent mean and variance (Image by Author)

Notice that we have also introduced a state-specific variance. We are saying that the residual errors of the model are normally distributed around a zero mean and a variance that switches between two values depending on which state the underlying Markov process is in.

statsmodels supports a variance-switching MSDR model.

Let’s build and train the 2-state MSDR model:

msdr_model = sm.tsa.MarkovRegression(endog=df['PCE_CHG'], k_regimes=2, trend='c', exog=df['UMCSENT_CHG'], switching_variance=True)

msdr_model_results = msdr_model.fit(iter=1000)

The trend=’c’ tells statsmodels to add an intercept to the X matrix.

Let’s print the model training summary:

print(msdr_model_results.summary())

We get the following output:

MSDR model training summary (Image by Author)

The variable names in the above results can be interpreted as follows:
β_0=const
β_1=x1
σ²=sigma2
p[0->0]=p_00, and therefore, p_01=1 — p_00
p[1->0]=p_10, and therefore, p_11–1-p_10

By looking at the fitted coefficients’ values in the results, we can write the regime-specific model equations as follows:

For regime=0 (low-variance regime):
PCE_CHG = 0.0048 + 0.0026*UMCSENT + ε_t,
where ε_t ~ N(0, 1.701e-05)

For regime=1 (high-variance regime):
PCE_CHG = 0.0041 + 0.1203*UMCSENT + ε_t,
where ε_t ~ N(0, 0.0012)

The Markov state transition diagram and matrix P are as follows:

The state transition diagram and transition matrix P for the fitted 2-state Markov model (Image by Author)

We see that when Personal Consumption Expenditures are in a low variance regime, they tend to switch to a high variance regime less than 2% of the time, while if the consumption expenditure is in a high variance regime, it tends to switch back to a low variance regime with roughly 20% probability.

A somewhat troubling fact that comes to light in the training results is that in regime 0, the coefficient of UMCSENT (β_10) is not statistically significant at a 95% confidence level (p-value=0.513 > 0.05), and in regime 1, both β_01 and β_11 are not statistically significant at 95% confidence level.

Let’s plot the Markov state probabilities π_t at each time step t. In our example, the time step is one month. Alongside the chart of π_t, we’ll additionally plot the PCE % change chart, and we’ll also plot the dates of U.S. recessions as inferred by the GDP-based recession indicator data set (sourced from US FRED under public domain license).

df_r = pd.read_csv('JHDUSRGDPBR.csv', header=0, index_col=0, infer_datetime_format=True, parse_dates=['DATE'])

figure, axes = plt.subplots(3)
ax = axes[0]
ax.plot(df.index, df['PCE_CHG'])
ax.set(title='% Chg in Personal Consumption Expenditure')

ax = axes[1]
ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[0])
ax.set(title='Smoothed probability of regime 0')

ax = axes[2]
ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[1])
ax.plot(df_r.index, df_r['JHDUSRGDPBR'])
ax.set(title='Smoothed probability of regime 1 super-imposed on GDP based recession indicator (Orange)')

plt.show()

We get the following plot

Markov state probabilities super-imposed with the GDP-based recession indicator (Image by Author)

We see that often when the Markov state model is in the high variance state, the US economy is in a recession.

Here’s the complete source code:

	import pandas as pd
	import numpy as np
	from matplotlib import pyplot as plt
	import statsmodels.api as sm

	#Load the PCE and UMCSENT datasets
	df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0,
	infer_datetime_format=True, parse_dates=['DATE'])
	#Set the index frequency to 'Month-Start'
	df = df.asfreq('MS')

	#Plot both time series
	fig = plt.figure()
	fig.suptitle('% Chg in Personal Consumption Expenditure')
	df['PCE_CHG'].plot()
	plt.show()
	fig = plt.figure()
	fig.suptitle('% Chg in U. Michigan Consumer Sentiment Index')
	df['UMCSENT_CHG'].plot()
	plt.show()

	#build and train the MSDR model
	msdr_model = sm.tsa.MarkovRegression(endog=df['PCE_CHG'], k_regimes=2,
	trend='c', exog=df['UMCSENT_CHG'], switching_variance=True)
	msdr_model_results = msdr_model.fit(iter=1000)

	#print model training summary
	print(msdr_model_results.summary())

	df_r = pd.read_csv('JHDUSRGDPBR.csv', header=0, index_col=0,
	infer_datetime_format=True, parse_dates=['DATE'])

	fig, axes = plt.subplots(3)
	ax = axes[0]
	ax.plot(df.index, df['PCE_CHG'])
	ax.set(title="% Chg in Personal Consumption Expenditure")

	ax = axes[1]
	ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[0])
	ax.set(title="Smoothed probability of regime 0")

	ax = axes[2]
	ax.plot(df.index, msdr_model_results.smoothed_marginal_probabilities[1])
	ax.plot(df_r.index, df_r['JHDUSRGDPBR'])
	ax.set(title="Smoothed probability of regime 1 super-imposed on GDP based "
	"recession indicator (Orange)")

	plt.show()

view raw markov_switching_dynamic_regression.py hosted with ❤ by GitHub

References and Copyrights

Data sets

U.S. Bureau of Economic Analysis, Personal Consumption Expenditures [PCE], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PCE, November 11, 2021. Available under Public license.

University of Michigan, Survey Research Center, Surveys of Consumers. The Index of Consumer Sentiment. Available under public license.

Hamilton, James, Dates of U.S. recessions as inferred by GDP-based recession indicator [JHDUSRGDPBR], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/JHDUSRGDPBR, November 12, 2021.

Books

James D. Hamilton, Time Series Analysis, Princeton University Press, 2020. ISBN: 0691218633

Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

PREVIOUS: Introduction to the Regression with ARIMA Errors Model

NEXT: The Auto-Regressive Poisson Model

UP: Table of Contents