Getting to Know The Poisson Process And The Poisson Probability Distribution

Learn about the Poisson process and how to simulate it using Python

Let’s look at how a Poisson sequence might look like.

A sample Poisson process (Image by Author)

The X-axis shows the patient number. The Y-axis shows the simulated time at which that patient arrived at a hospital’s Emergency Room. While creating the above simulation, we have assumed that the average arrival rate is 5 patients per hour.

It turns out such “arrivals” data can be modeled very nicely using a Poisson process.

Poisson processes can be seen in all walks of life. Here are some examples:

At a drive-through pharmacy, the number of cars driving up to the drop off window in some interval of time.
The number of hot dogs sold by say, Papaya King, from 12pm to 4pm on Sundays.
Number of failures of ultrasound machines in a hospital over some period of time.
The number of vehicles passing through some intersection from 8am to 11am on weekdays.
Number of electrical pulses generated by a photo-detector that is exposed to a beam of photons, in 1 minute.
Number of meteors detected per hour during the Perseid meteor shower.

Poisson processes are everywhere. Pretty much any event that generates a sequence of whole numbered counts is a candidate for being modeled as a Poisson process.

Now that I have tickled your curiosity, let’s begin our journey into the wonderful world of Poisson processes.

But first, a quick overview of random variables and random processes.

Random variables and random processes

The word ‘variable’ in random variable is a misnomer. A random variable, usually denoted by X, Y, Z, X1, X2, Z3, etc., is actually a function! And like all well behaved functions, X has a domain and a range.

Domain( X ): The domain of X is the sample space of random outcomes. These outcomes arise when some stochastic experiment is performed (such as tossing a pair of coins). The outcomes may or may not be numeric. For example (Heads, Heads) or (Tails, Heads) are two possible outcomes of the coin toss experiment.

Range( X ): The range of X is the set of real numbers.

Random variable X maps the sample space of events to a range (Image by Author)

Why is X called a random variable? It’s because X outputs a value from the Range ( X ) using a probability distribution that is supposed to represent the likelihoods of occurrences of events in the sample space. In the above figure, the probability distribution could be:

{1 → 0.2, 2 → 0.45, 3 → 0.15, 4 → 0.2}

Notice that the sum of probabilities of all outcomes is 1 because the coin-toss experiment will always yield some outcome:

P(1 or 2 or 3 or 4) = P(1) + P(2) + P(3) + P(4)

= 0.2 + 0.45 + 0.15 + 0.2 = 1.0

The Probability Mass Function of X (Image by Author)

X can be either discrete or continuous.

The range of a discrete random variable is countably infinite, for e.g. the set of integers. A real world example of a discrete X is the number of cars passing through an intersection during some interval of time. The probability distribution of a discrete random variable is called the Probability Mass Function (PMF).

A continuous random variable’s range is the set of real numbers, i.e. an uncountably infinite set. Real world examples of a purely continuous X are not easy to find. A close approximation is the temperature of a place at a specific time of the year, measured to some arbitrarily large precision. The probability distribution of a continuous random variable is called the Probability Density Function (PDF).

Real world examples of a purely continuous random variable are not easy to find.

Now let’s go over what a random process is.

Random (a.k.a. stochastic) process

A random process is a sequence of random variables X1, X2, X3,… etc. usually indexed by time. Each variable can take on a different value from some probability distribution. See illustration below of a random process:

A random process can be either discrete or continuous depending on whether its member variables X1, X2, X3…etc are discrete or continuous variables.

We are now ready to look at what sort of a creature is the Poisson process.

The Poisson process

The Poisson process can be used to model the number of occurrences of events, such as patient arrivals at the ER, during a certain period of time, such as 24 hours, assuming that one knows the average occurrence of those events over some period of time. For example, an average of 10 patients walk into the ER per hour.

The Poisson process has the following properties:

It is made up of a sequence of random variables X1, X2, X3, …Xk such that each variable represents the number of occurrences of some event, such as patients walking into an ER, during some interval of time.
It is a stochastic process. Each time you run the Poisson process, it will produce a different sequence of random outcomes as per some probability distribution which we will soon see.
It is a discrete process. The Poisson process’s outcomes are the number of occurrences of some event in the specified period of time, which is undoubtedly an integer —i.e. a discrete number.
It has independent increments. What this is means is that the number of events that the process predicts will occur in any given interval, is independent of the number in any other disjoint interval. For e.g. the number of people walking into the ER from time zero (start of the observation) up through 10am, is independent of the number walking in from 3:33pm to 8:26pm, or from 11:00pm to 11:05pm and so on.
The Poisson process’s constituent variables X1, X2, X3,…Xk all have identical distribution.
The Poisson process’s constituent variables X1, X2, X3,…Xk all have a Poisson distribution, which is given by the Probability Mass Function:

Probability of k occurrences in unit time, given an average occurrence rate of λ per unit time (Image by Author)

The above formula gives us the probability of occurrence of k events in unit time, given that the average occurrence rate is λ events per unit time.

The following 4 plots show the shape of the PMF for different values of λ:

P(k) for λ = 0.1, 1.0, 5 and 20 events per unit time (Image by Author)

In each plot, you can see that the probability peaks at the corresponding value of λ, and it tapers off on both sides of this value.

In each plot, the sum of probabilities for all possible values of k is always 1.0, i.e. it’s a certainty that one of the outcomes will materialize.

Let’s take a closer look at the situation when λ = 5. In our example, this corresponds to five patient arrivals per hour. The probability of 0,1, 2, 3, … , 10, 11, … etc. patients walking into the ER in one hour looks like this:

Probability of k arrivals in one hour given that an average of 5 patients walk into the ER per hour (Image by Author)

As you can see the probability peaks at k = 5.

To know the likelihood of k patients walking into the ER in t hours, we model it as a Poisson process with a rate (λt). The corresponding formula for the PMF for k occurrences in time t looks like this:

Probability of k arrivals in time t, given λ arrivals in unit time (Image by Author)

The following set of probability distributions have all been generated using the above Poisson distribution formula by scaling the rate λ by a different time interval t:

Probability of k arrivals in time t, given λ = 5 per hour (Image by Author)

Modeling inter-arrival times

The Poisson process has a remarkable substructure. Even though the number of occurrence of events is modeled using a discrete Poisson distribution, the interval of time between consecutive events can be modeled using the Exponential distribution,which is a continuous distribution.

Let’s explore this further.

Let X1, X2, X3,…etc. be random variables such that:

X1 = the interval of time between the start of the process and the 1st event, i.e. 1st arrival,
X2 = the inter-arrival time between the first and the second arrival,
X3 = the inter-arrival time between the second and the third arrival,
and so forth.

The distribution of random variable Xk which represents the inter-arrival time between the (k-1)th and (k)th arrival is:

Inter-arrival times are exponentially distributed (Image by Author)

The Probability Density Function (PDF) of the random variable Xk is as follows:

PDF of inter-arrival times in a Poisson process (Image by Author)

And the Cumulative Distribution Function (CDF) is:

CDF of interval-arrival times in a Poisson process (Image by Author)

Recollect that CDF of X returns the probability that the interval of time between consecutive arrivals will be less than or equal to some value t.

Simulating inter-arrival times in a Poisson process

We now have enough information to generate inter-arrival times in a Poisson process. We do this by using the Inverse-CDF technique, in which we literally construct the inverse function of the CDF, and feed it different probability values from a Uniform(0,1) distribution. This gives us the corresponding inter-arrival times for the respective probabilities.

The inverse function of the CDF of the inter-arrival times is:

Inverse of the CDF of inter-arrival times (Image by Author)

As mentioned earlier, we feed into this function, probability values from the continuous uniform distribution Uniform(0,1). We’ll soon see how to do this operation programmatically using a couple of lines of Python code.

For now, following is the table of patient inter-arrival times in hours at the ER for the first 10 patients. We have generated this date using the above formula, with λ set to 5 patients per hour.

λ = 5 arrivals per hour (Image by Author)

Here is the plot of interval-times for the first 500 arrivals. As expected, it is the inverse of the graph of the CDF.

Inverse of CDF of interval arrival times (Image by Author)

Modeling the arrival times in a Poisson process

Now that we know how to generate inter-arrival times, it is easy to generate the patient arrival times.

From the table of 10 sample inter-arrival times shown above, we can deduce the following:

Arrival time of first patient = x1 = 0.431257556

Arrival time of second patient =
x1 + inter-arrival time between first and second patient =
x1 + x2 = 0.431257556 + 0.264141966 = 0.6954

Arrival time of third patient =
x1 + x2 + x3 = 0.431257556 + 0.264141966 + 0.190045932 = 0.885445

… and so on

Keeping in mind that X1, X2, X3,…Xk are the inter-arrival times, if we define T1, T2, T3, …Tk as the variables that will represent the patient arrival times at the ER, we see that:

T1 = X1
T2 = X1 + X2
T3 = X1 + X2 + X3
…
Tk = X1 + X2 + X3 + … + Xk

Notice that since T1, T2, T3…Tk are defined as linear combinations of random variables X1, X2, X3,…Xk, the variables T1, T2, T3,…Tk are also random variables.

Here is one more very interesting fact:

Since T1, T2, T3…Tk are each the sum of exponentially distributed random variables X1, X2, X3,…Xk, the random variables T1, T2, T3,…,Tk follow the Gamma distribution.

The arrival times in a Poisson process follow the Gamma distribution which is a continuous distribution.

Let’s take a step back and note how smoothly we traveled from a discrete distribution to a set of continuous distributions! Such is the magical structure of the Poisson process. While the process itself is discrete, its substructure is represented entirely by continuous random variables.

The following schematic summarizes the three main distributions that comprise the Poisson process:

The sub-structure of the Poisson process (Image by Author)

Simulating a Poisson process

We are now ready to simulate the entire Poisson process.

To do so, we need to follow this simple 2-step procedure:

For the given average incidence rate λ, use the inverse-CDF technique to generate inter-arrival times.
Generate actual arrival times by constructing a running-sum of the interval arrival times.

Here is the Python code to simulate a Poisson process:

	import random
	import math

	_lambda = 5
	_num_arrivals = 100
	_arrival_time = 0

	print('RAND,INTER_ARRV_T,ARRV_T')

	for i in range(_num_arrivals):
	#Get the next probability value from Uniform(0,1)
	p = random.random()

	#Plug it into the inverse of the CDF of Exponential(_lamnbda)
	_inter_arrival_time = -math.log(1.0 – p)/_lambda

	#Add the inter-arrival time to the running sum
	_arrival_time = _arrival_time + _inter_arrival_time

	#print it all out
	print(str(p)+','+str(_inter_arrival_time)+','+str(_arrival_time))

view raw poisson_sim.py hosted with ❤ by GitHub

Python code to simulate a Poisson process

And here is the output of this program, giving us a completely simulated but 100% genuine Poisson sequence:

Poisson simulated arrivals (Image by Author)

If one rounds up the arrival times to the nearest hour and rotates the plot by 90 degrees, one can spot the average arrival rate of 5 per hour showing through.

Here is the complete source code for generating:

Times between consecutive events in a simulated Poisson process.
Absolute times of consecutive events in a simulated Poisson process.
Number of events occurring in consecutive intervals in a simulated Poisson process.
Calculate the mean number of events per unit time.

	import random
	import math
	import statistics
	import matplotlib.pyplot as plt


	_lambda = 5
	_num_events = 100
	_event_num = []
	_inter_event_times = []
	_event_times = []
	_event_time = 0

	print('EVENT_NUM,INTER_EVENT_T,EVENT_T')

	for i in range(_num_events):
	_event_num.append(i)
	#Get a random probability value from the uniform distribution's PDF
	n = random.random()

	#Generate the inter-event time from the exponential distribution's CDF using the Inverse-CDF technique
	_inter_event_time = -math.log(1.0 – n) / _lambda
	_inter_event_times.append(_inter_event_time)

	#Add the inter-event time to the running sum to get the next absolute event time
	_event_time = _event_time + _inter_event_time
	_event_times.append(_event_time)

	#print it all out
	print(str(i) +',' + str(_inter_event_time) + ',' + str(_event_time))

	#plot the inter-event times
	fig = plt.figure()
	fig.suptitle('Times between consecutive events in a simulated Poisson process')
	plot, = plt.plot(_event_num, _inter_event_times, 'bo-', label='Inter-event time')
	plt.legend(handles=[plot])
	plt.xlabel('Index of event')
	plt.ylabel('Time')
	plt.show()


	#plot the absolute event times
	fig = plt.figure()
	fig.suptitle('Absolute times of consecutive events in a simulated Poisson process')
	plot, = plt.plot(_event_num, _event_times, 'bo-', label='Absolute time of event')
	plt.legend(handles=[plot])
	plt.xlabel('Index of event')
	plt.ylabel('Time')
	plt.show()

	_interval_nums = []
	_num_events_in_interval = []
	_interval_num = 1
	_num_events = 0

	print('INTERVAL_NUM,NUM_EVENTS')

	for i in range(len(_event_times)):
	_event_time = _event_times[i]
	if _event_time <= _interval_num:
	_num_events += 1
	else:
	_interval_nums.append(_interval_num)
	_num_events_in_interval.append(_num_events)

	print(str(_interval_num) +',' + str(_num_events))

	_interval_num += 1

	_num_events = 1

	#print the mean number of events per unit time
	print(statistics.mean(_num_events_in_interval))

	#plot the number of events in consecutive intervals
	fig = plt.figure()
	fig.suptitle('Number of events occurring in consecutive intervals in a simulated Poisson process')
	plt.bar(_interval_nums, _num_events_in_interval)
	plt.xlabel('Index of interval')
	plt.ylabel('Number of events')
	plt.show()

view raw poisson_sim2.py hosted with ❤ by GitHub

Poisson process simulation

PREVIOUS: The Weak Law of Large Numbers

NEXT: A Deeper Dive Into The Poisson Probability Distribution Function

UP: Table of Contents