The Poisson INAR(1) Regression Model - Statistical Modeling and Forecasting

A Poisson regression model for whole numbered time series data and a tutorial on how to implement it using Python and statsmodels

Integer ARIMA models are used for modeling time series data consisting of whole numbered counts.

Such data sets pose a few unique challenges:

The data is auto-correlated: Time series data is often auto i.e. self correlated. Any model we build for such data needs to account for these serial correlations. ARIMA (Auto-regressive Integrated Moving Average) models are designed to capture auto-correlations in time series data.

The data consists of only whole numbered counts 0,1,2,3,…etc.: ARIMA models are designed for modeling real valued time series data, and not counts based time series data. Counts based data can be modeled using Poisson and Poisson-like models such as Negative Binomial and Generalized Poisson models. Unfortunately, Poisson and Poisson-like models are static models that are not designed to handle correlated time series data.

The Poisson Integer ARIMA model bridges the gap between the ARIMA model for time series data, and the Poisson and Poisson-like models for counts based data sets.

As we’ll see below, structurally, Poisson INARIMA models are built very differently than either the ARIMA or the Poisson model.

In this section, we’ll focus on the Poisson INAR(1) model introduced by Brannas in “Explanatory Variables in the AR(1) Model” (see paper link at the bottom of the page). INAR(1) is an integer ARIMA model with an auto-regressive component of order 1, i.e. it uses only the first time lagged component, y_(t-1). The Poisson AR(1) models the error component using a Poisson process.

Let’s examine the structure of the Poisson Integer AR(1) model. We’ll often use the symbolic conventions followed in the Regression Analysis of Count Data book by Cameron A.C. and P.K. Trivedi, and some of the source papers on the topic. Book and paper links are mentioned at the end of the page.

Structure of the Poisson INAR(1) model

The generic Integer AR(1) model is expressed as follows:

The generic Integer INAR(1) regression model (Image by Author)

In the INAR(1) model, we express express y_t, which is the value of the time series at time t, as the sum of two quantities:

ρ ○ y_(t-1): This term is a function of the value of the time series at the previous time step (t-1). We’ll get to the meaning of the ○ operator soon.

ε_t: This is a random variable that represents the hidden and therefore unknown variables. We will choose to model the effect of the latent variables by assuming some known probability distribution such as Normal or Poisson for the random variable ε_t.

The binomial thinning operator ○

The first term, ρ ○ y_(t-1) deserves additional explanation. The way to understand this quantity is as follows:

Suppose we are modeling the number of hits per day on an eCommerce site. This is a time series data of counts, and one can reasonably expect the number of hits on day t to be correlated with the number of hits on prior days.

Now suppose on some day t, the number of hits y_t =1000. Suppose also that we look at this number, 1000, as 1000 independent, identically distributed random variables: b_1, b_2, b_3,…b_1000. Each variable b_i follows the following Bernoulli distribution:

The Bernoulli random variable b_i (Image by Author)

So we can look at the value of y_t=1000 as 1000 independent Bernoulli trials, each one with a probability of success of ρ. If a trial ‘succeeds’, a hit is registered on the website. Thus, the expected number of hits one day t is simply ρ times y_t.

Now we can see how the INAR(1) equation models the number of hits on the website at time index t, as the sum of:

The expected number of website hits at the previous time step (t-1), and
An adjustment to this expected value, provided by the realized value of the random variable ε_t. ε_t is used to model some latent data generation process.

The Poisson INAR(1) model

Since we wish to model time series data of whole numbered counts, it would be appropriate to assume that the underlying (unknown) data generating process is a Poisson process. And therefore, ε_t is a Poisson distributed random variable.

Thus, the Poisson INAR(1) model can be expressed as follows:

The Poisson AR(1) model — The Poisson INAR(1) model (Image by Author)

Basically, what we are saying is that the value observed at time t is a combination of the expected value at the previous time step, and a Poisson distributed count assuming a Poisson rate of μ_t.

How to handle regression variables

Suppose X is the matrix of regression variables, and β is a vector of regression coefficients, as follows:

The matrix of regression variables X and the regression coefficients vector β (Image by Author)

We will express the Poisson process’s mean rate of occurrence μ_t as the following function of X and the regression coefficients β as follows:

The exponentiated Poisson mean (Image by Author)

Estimation

Estimation involves estimating ρ and β. Various researches in the field have proposed a number of estimation techniques such as Maximum Likelihood Estimation, Conditional Least Squares, Weighted Least Squares and Generalized Method of Moments. In “Explanatory Variables in the AR(1) Model”, Brannas used Conditional Least Squares and Conditional Generalized Method of Moments techniques for parameter estimation.

We will explore the use of the Maximum Likelihood Estimation (MLE) method to estimate ρ and β. Our choice of MLE is driven largely by the presence of the GenericLikelihoodModel class in the statsmodels library. The GenericLikelihoodModel class lets you specify a custom MLE function of your choice which statsmodels will happily maximize for you.

Maximum Likelihood Estimation for the Poisson INAR(1) model

In the maximum likelihood method, we seek values of ρ and β which would maximize the likelihood of observing the entire training data set. Specifically, we wish to find ρ and β which would maximize the natural logarithm of the joint probability of occurrence of y_t in y for t=1 to n. Notation-wise, we would like to maximize the following log-likelihood function:

The Log-likelihood function (Image by Author)

In practice, we convert the logarithm of the product into a summation of logarithm of individual probabilities by using the rule:

ln(A*B*C*D*…) = ln(A) + ln(B) + ln(C) + ln(D) + …

To maximize ℓ(⍴;β|y), we need to construct the conditional probability distribution P(y_t|y_(t-1)). Let’s see how to do this.

The Probability Mass Function of the Poisson INAR(1) distributed random variable

Let’s look at the equation of the Poisson INAR(1) model:

We see that y_t is made up of two components: the expected value of y_(t-1) and a Poisson distributed variable ε_t. Hence the probability of observing y_t, given y_(t-1), can be expressed as the product of two probabilities:

The Binomial probability of observing j number of ‘events’ out of y_(t-1) possible events, followed by,
The Poisson probability of observing (y_t — j) ‘events’.

Since we don’t know what j is, we allow j to range from 0 through y_t (actually, 0 through min(y_t, y_(t-1), but that’s a technicality so that everything still makes sense). For each value of j, we calculate the product of the above two probabilities in (1) and (2). Finally, we sum up all the individual products, as follows:

The PMF of the Poisson INAR(1) distributed random variable (Image by Author)

What the above equation is saying is that the probability of observing y_t at time step t given y_(t-1) was observed at the previous time step, is equal to the Binomial probability of observing 0 out of y_(t-1) events and the Poisson probability of observing y_t events, OR, the Binomial probability of observing 1 out of y_(t-1) events and the Poisson probability of observing (y_t — 1) events, and so forth.

Let’s recollect that in the above equation, the Poisson process’s mean rate of occurrence μ_t is expressed as the following function of regression variables x_t and the regression coefficients β:

Let’s also recollect that ρ is the Binomial probability and while doing MLE, we don’t want ρ to go out of bounds. Hence, we define another variable γ such that:

The above logistic function ensures that as γ ranges from -∞ to +∞, the Bernoulli probability ρ remains bounded in [0,1].

How to build and train a Poisson INAR(1) using Python and Statsmodels

We’ll illustrate the process of using a Poisson INAR(1) model using the STRIKES data set:

The MANUFACTURING STRIKES data set

To illustrate the model fitting procedure, we will use the following open source data set that is widely used in regression modeling literature:

Manufacturing strikes (Data source: U.S. BLS via R data sets)

The data set is a monthly time series showing the relationship between U.S. manufacturing activity measured as a departure from the trend line, and the number of contract strikes in U.S. manufacturing industries beginning each month from 1968 through 1976.

The STRIKES Data set (Source: R data sets) (Image by Author)

This data set is available in R and it can be fetched using the statsmodels Datasets package.

The dependent variable y is strikes.

We’ll start by importing all the required packages:

import math
import numpy as np
import statsmodels.api as sm
from statsmodels.base.model import GenericLikelihoodModel
from scipy.stats import poisson
from scipy.stats import binom
from patsy import dmatrices
import statsmodels.graphics.tsaplots as tsa
from matplotlib import pyplot as plt

Let’s load the data set into memory using statsmodels:

strikes_dataset = sm.datasets.get_rdataset(dataname='StrikeNb', package='Ecdat')

Print out the data set:

print(strikes_dataset.data)

We see the following output:

We’ll consider the first 92 data points as the training set and the remaining 16 data points as the test data set:

strikes_data = strikes_dataset.data.copy()
strikes_data_train = strikes_data.query('time<=92')
strikes_data_test = strikes_data.query('time>92').reset_index().drop('index', axis=1)

Here is our regression expression. strikes is the dependent variable and output is our explanatory variable. The intercept of regression is assumed to be present:

expr = 'strikes ~ output'

We’ll use Patsy to carve out the X and y matrices. Patsy will automatically add a regression intercept column to X:

y_train, X_train = dmatrices(expr, strikes_data_train, return_type='dataframe')
print(y_train)
print(X_train)
y_test, X_test = dmatrices(expr, strikes_data_test, return_type='dataframe')
print(y_test)
print(X_test)

Next, we will extend the GenericLikelihoodModel:

class INAR(GenericLikelihoodModel):
    def __init__(self, endog, exog, **kwds):
        super(INAR, self).__init__(endog, exog, **kwds)

In our extension, we will override the nloglikeobs() and the fit() methods. The nloglikeobs() method is called by statsmodels to get the value of the log-likelihood of each observation y_t. Therefore, the likelihood function of the Poisson INAR(1) that we had described earlier goes into this method.

This method returns an array of log-likelihoods, and the super-class supplied by statsmodels, sums up all the values in this array to get the total log-likelihood value that is optimized by the statsmodels’ optimizer.

class PoissonINAR(GenericLikelihoodModel):
    def __init__(self, endog, exog, **kwds):
        super(INAR, self).__init__(endog, exog, **kwds)

    def nloglikeobs(self, params):
        #Fetch the parameters gamma and beta 
        #that we would be optimizing
        gamma = params[-1]
        beta = params[:-1]
        
        #Set y and X
        y = self.endog
        y = np.array(y)
        X = self.exog
        
        #Compute rho as a function of gamma
        rho = 1.0/(1.0+math.exp(-gamma))
        
        #Compute the Poisson mean mu as the exponentiated dot 
        #product of X and Beta
        mu = np.exp(X.dot(beta))

        #Init the list of log-likelihhod values, 
        #one value for each y
        ll = []

        #Compute all the log-likelihood values for 
        #the Poisson INAR(1) model
        for t in range(len(y)-1,0,-1):
            prob_y_t = 0
            for j in range(int(min(y[t], y[t-1])+1)):
                prob_y_t += poisson.pmf((y[t]-j), mu[t]) *  
                            binom.pmf(j, y[t-1], rho)
            ll.append(math.log(prob_y_t))
        ll = np.array(ll)
        #return the negated array of log-likelihoods
        return -ll

Let’s also implement the model.fit() method:

def fit(self, start_params=None, maxiter=1000, maxfun=5000, **kwds):
    #Add the gamma parameter to the list of 
    #exogneous variables that the model will optimize
    self.exog_names.append('gamma')

    if start_params == None:
        #Start with some initial values of Beta and gamma
        start_params = np.append(np.ones(self.exog.shape[1]), 1.0)

    #Call super.fit() to start the training
        return super(PoissonINAR, self).fit(start_params=start_params,
            maxiter=maxiter, maxfun=maxfun, **kwds)

Let’s create an instance of the Poisson INAR(1) model class, and train it on the training data set:

inar_model = PoissonINAR(y_train, X_train)
inar_model_results = inar_model.fit()

Print the model training summary:

print(inar_model_results.summary())

We see the following output:

The training summary of the Poisson INAR(1) model (Image by Author)

Significance of regression coefficients

The coefficients for the output variable and gamma are both significant at a 95% confidence interval as evidenced by their p-values which are less than 0.05. So is the intercept of regression which is significantly different than zero:

Significance of the coefficients of regression (Image by Author)

Interpretation of regression coefficients

Interpretation of coefficients is not straightforward. Gamma is -0.7039 which corresponds to a ρ of 1/(1+exp(0.7039)) = 0.33095.

The estimated β is 2.6215.

The fitted model’s equation is:

Equation of the fitted Poisson INAR(1) regression model (Image by Author)

For any given t, 0.33095 (roughly 33%) of y_(t-1) constitutes y_t. The rest comes from the Poisson process’s estimated mean μ_t which is a function of output at time t and the estimated β vector.

Measuring the effect of changes in output on frequency of strikes

From the model equation, it is easy to see that the number of observed strikes increases with increase in manufacturing output. We may want to measure by how much they increase.

In the training data set, the ‘output’ variable has a standard deviation of 0.05654. This can be seen by printing the following:

strikes_data_train['output'].std()

Now suppose we consider two hypothetical values of y_t, namely y_t1 and y_t2 such that their respective previous values y_(t1–1) and y_(t2–1) happen to be the same. Since β=2.6215, one standard deviation increase in ‘output’, will cause the Poisson process’s estimated mean to fluctuate by e^(2.6215*0.05654) = 1.15977, i.e. approximately 16%. Thus one standard deviation increase in output causes the number of strikes observed per month to increase by 16%.

Goodness of fit of the Poisson INAR(1) model

Determination of goodness of fit is made complicated by the lagged term y_(t-1) in the model’s equation. The non linearity of the Poisson INAR(1) model precludes the use of Mean Squared Error based measures such as R-squared. On the other hand, since MLE was used for model fitting, deviance based measures such as pseudo-r-squared and the Chi-squared distributed Likelihood Ratio (LR) test may be used to judge the goodness of fit. However, both pseudo-r-squared and the LR test require the Log-likelihood (LL) of the Null model (a.k.a. the Intercept only model) to be calculated. With the Poisson INAR(1) model, due to the presence of the lagged term y_(t-1) in the model’s equation, what constitutes a Null model is not immediately obvious and is open for debate.

Given these difficulties, one may want to the judge the goodness of fit via indirect means, particularly, by inspecting the standard errors and the corresponding 95% confidence intervals of the fitted model’s parameters.

Upon taking this approach with the ‘strikes’ model, we observe the following things:

The 95% confidence interval of ρ ranges from 1/(1+exp(1.174))=0.23613 to 1/(1+exp(0.233))=0.44201 which is reasonably tight around the fitted value of ρ of 1/(1+exp(0.7039)) = 0.33095.

Unfortunately, the ‘output’ variable has a pretty large standard error of 1.126 and a correspondingly wide confidence interval of 0.415 to 4.828. As expected, and as we saw in the previous section, it’s effect on ‘strikes’ is pretty weak. That has a compromising effect on the goodness of fit of the model.

Prediction

The model allows for making one-step ahead forecasts. Prediction consists of running the fitted model on the X matrix and lagged values of y. We will run the fitted model on X_test and lagged values of y. To do so, we will implement the predict() method on model.py in statsmodels as follows:

def predict(self, params, exog=None, *args, **kwargs):
    #Fetch the optimized values of parameters gamma and beta
    fitted_gamma = params[-1]
    fitted_beta = params[:-1]

    #Compute rho as a function of gamma
    rho = 1.0/(1.0+math.exp(-fitted_gamma))

    #Get the Intercept and the regression variables,
    #Don't get the last column which contains the lagged y values
    X = exog[:,:-1]
    #Fetch the lagged y values
    y_lag_1 = exog[:,-1]

    #Compute the predicted y using the fitted Poisson INAR(1) 
    #model's equation
    y_pred = rho * y_lag_1 + np.exp(X.dot(fitted_beta))

   return y_pred

Let’s prepare the X matrix for prediction:

X_test['y_lag_1'] = y_test.shift(1)
X_test = X_test.fillna(0)

Generate the predictions on the test data set. We will round up the predictions as we are interested in the counts.

inar_predictions = np.round(inar_model_results.predict(exog=X_test))
print(inar_predictions)

We get the following output:

Let’s plot the one-step-ahead forecasts for y_test:

predicted_counts=inar_predictions
actual_counts = y_test['strikes']

fig = plt.figure()
fig.suptitle('Predicted versus actual strike counts')

predicted, = plt.plot(X_test.index, predicted_counts, 'go-', label='Predicted counts')
actual, = plt.plot(X_test.index, actual_counts, 'ro-', label='Actual counts')

plt.legend(handles=[predicted, actual])
plt.show()

Here’s what we see:

One-step-ahead forecasts of number of strikes created by the Poisson IINAR(1) model (Image by Author)

The quality of the one-step ahead forecasts bear out the problems we highlighted earlier with the goodness-of-fit of the Poisson INAR(1) model.

Here is the link to the complete source code:

	import math
	import numpy as np
	import statsmodels.api as sm
	from statsmodels.base.model import GenericLikelihoodModel
	from scipy.stats import poisson
	from scipy.stats import binom
	from patsy import dmatrices
	import statsmodels.graphics.tsaplots as tsa
	from matplotlib import pyplot as plt

	#Let's load the data set into memory using statsmodels
	strikes_dataset = sm.datasets.get_rdataset(dataname='StrikeNb', package='Ecdat')

	#Print out the data set
	print(strikes_dataset.data)

	#We'll consider the first 92 data points as the training set and the remaining
	# 16 data points as the test data set
	strikes_data = strikes_dataset.data.copy()
	strikes_data_train = strikes_data.query('time<=92')
	strikes_data_test = strikes_data.query('time>92').reset_index().drop('index', axis=1)

	#Here is our regression expression. strikes is the dependent variable and
	# output is our explanatory variable.
	#The intercept of regression is assumed to be present
	expr = 'strikes ~ output'

	#We'll use Patsy to carve out the X and y matrices. Patsy will automatically
	# add a regression intercept column to X
	y_train, X_train = dmatrices(expr, strikes_data_train, return_type='dataframe')
	print(y_train)
	print(X_train)
	y_test, X_test = dmatrices(expr, strikes_data_test, return_type='dataframe')
	print(y_test)
	print(X_test)


	class PoissonINAR(GenericLikelihoodModel):
	def __init__(self, endog, exog, **kwds):
	super(PoissonINAR, self).__init__(endog, exog, **kwds)

	def nloglikeobs(self, params):
	#Fetch the parameters gamma and beta that we would be optimizing
	gamma = params[-1]
	beta = params[:-1]
	#Set y and X
	y = self.endog
	y = np.array(y)
	X = self.exog
	#Compute rho as a function of gamma
	rho = 1.0/(1.0+math.exp(-gamma))
	#Compute the Poisson mean mu as the exponentiated dot product of X and Beta
	mu = np.exp(X.dot(beta))
	#Init the list of loglikelihhod values, one value for each y
	ll = []
	#Compute all the log-likelihood values for the Poisson INAR(1) model
	for t in range(len(y)-1,0,-1):
	prob_y_t = 0
	for j in range(int(min(y[t], y[t-1])+1)):
	prob_y_t += poisson.pmf((y[t]-j), mu[t]) * binom.pmf(j, y[t-1], rho)
	ll.append(math.log(prob_y_t))
	ll = np.array(ll)
	print('gamma='+str(gamma) + ' rho='+str(rho) + ' beta='+str(beta) + ' ll='+str(((-ll).sum(0))))
	#return the negated array of log-likelihood values
	return -ll

	def fit(self, start_params=None, maxiter=1000, maxfun=5000, **kwds):
	#Add the gamma parameter to the list of exogneous variables that
	# the model will optimize
	self.exog_names.append('gamma')
	if start_params == None:
	#Start with some initial values of Beta and gamma
	start_params = np.append(np.ones(self.exog.shape[1]), 1.0)
	#Call super.fit() to start the training
	return super(PoissonINAR, self).fit(start_params=start_params,
	maxiter=maxiter, maxfun=maxfun, **kwds)

	def predict(self, params, exog=None, args, *kwargs):
	#Fetch the optimized values of parameters gamma and beta
	fitted_gamma = params[-1]
	fitted_beta = params[:-1]
	X = np.array(exog)
	#Compute rho as a function of gamma
	rho = 1.0/(1.0+math.exp(-fitted_gamma))
	#Fetch the Intercept and the regression variables,
	# except for the last column which contains the lagged y values
	X = exog[:,:-1]
	#Fetch the lagged y values
	y_lag_1 = exog[:,-1]
	#Compute the predicted y using the Poisson INAR(1) model's equation
	y_pred = rho * y_lag_1 + np.exp(X.dot(fitted_beta))
	return y_pred


	#Let's create an instance of the Poisson INAR(1) model class
	inar_model = PoissonINAR(y_train, X_train)
	inar_model_results = inar_model.fit()

	#Print the model training summary
	print(inar_model_results.summary())

	#Prepare the X matrix for prediction
	X_test['y_lag_1'] = y_test.shift(1)
	X_test = X_test.fillna(0)

	#Generate predictions on the test data set
	inar_predictions = np.round(inar_model_results.predict(exog=X_test))

	print(inar_predictions)

	#plot the predicted counts versus the actual counts for the test data
	predicted_counts=inar_predictions
	actual_counts = y_test['strikes']
	fig = plt.figure()
	fig.suptitle('Predicted versus actual strike counts')
	predicted, = plt.plot(X_test.index, predicted_counts, 'go-', label='Predicted counts')
	actual, = plt.plot(X_test.index, actual_counts, 'ro-', label='Actual counts')
	plt.legend(handles=[predicted, actual])
	plt.show()

view raw poisson_integer_arima.py hosted with ❤ by GitHub

The Poisson Integer ARIMA model using Python and statsmodels

Citations and Copyrights

Book

Cameron A. Colin, Trivedi Pravin K., Regression Analysis of Count Data, Econometric Society Monograph №30, Cambridge University Press, 1998. ISBN: 0521635675

Papers

Brännäs, Kurt. (1995). Explanatory variables in the AR(1) count data model PDF download link

Jung, Robert C. and A., Tremayne. Binomial thinning models for integer time series. Statistical Modeling 6 (2006): 81–96., DOI:10.1191/1471082X06st114oa PDF download link

Kennan J., The duration of contract strikes in U.S. manufacturing, Journal of Econometrics, Volume 28, Issue 1, 1985, Pages 5–28, ISSN 0304–4076, https://doi.org/10.1016/0304-4076(85)90064-8. PDF download link

Data set

The Manufacturing strikes data set is one of several data sets available for public use and experimentation in statistical software, most notably, over here as an R package. The data set has been made accessible for use in Python by Vincent Arel-Bundock via vincentarelbundock.github.io/rdatasets under a GPL v3 license.

Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

PREVIOUS: The Auto-Regressive Poisson Regression Model for Auto-correlated Time Series Data

NEXT: The Poisson Hidden Markov Model – Part 1 (Concepts and Theory)

UP: Table of Contents