###### Learn about the Poisson process and how to simulate it using Python

Let me begin by tickling your curiosity a little bit. Let’s look at how a Poisson sequence might look like.

The X-axis shows the patient number. The Y-axis shows the simulated time at which that patient arrived at a hospital’s Emergency Room. While creating the above simulation, we have assumed that *the average arrival rate is 5 patients per hour*.

It turns out such “arrivals” data can be modeled very nicely using a Poisson process.

Poisson processes can be seen in all walks of life. Here are some examples:

- At a drive-through pharmacy, the number of cars driving up to the drop off window in some interval of time.
- The number of hot dogs sold by say, Papaya King, from 12pm to 4pm on Sundays.
- Number of failures of ultrasound machines in a hospital over some period of time.
- The number of vehicles passing through some intersection from 8am to 11am on weekdays.
- Number of electrical pulses generated by a photo-detector that is exposed to a beam of photons, in 1 minute.
- Number of meteors detected per hour during the Perseid meteor shower.

Poisson processes are everywhere. Pretty much any event that generates a sequence of whole numbered counts is a candidate for being modeled as a Poisson process.

Now that I have tickled your curiosity, let’s begin our journey into the wonderful world of Poisson processes.

But first, a quick overview of **random variables** and **random processes**.

## Random variables and random processes

The word ‘variable’ in **random variable** is a misnomer. A random variable, usually denoted by *X, Y, Z, X1, X2**, *** Z3**, etc., is actually a function! And like all well behaved functions,

**has a domain and a range.**

*X***Domain( X )**: The domain of** X** is the sample space of random outcomes. These outcomes arise when some stochastic experiment is performed (such as tossing a pair of coins). The outcomes may or may not be numeric. For example (Heads, Heads) or (Tails, Heads) are two possible outcomes of the coin toss experiment.

**Range( X ): **The range of ** X** is the set of real numbers.

Why is ** X** called a

*random*variable? It’s because

**outputs a value from the Range (**

*X***X**) using a probability distribution that is supposed to represent the likelihoods of occurrences of events in the sample space. In the above figure, the probability distribution could be:

{**1** → 0.2, **2** → 0.45, **3** → 0.15, **4** → 0.2}

Notice that the sum of probabilities of all outcomes is 1 because the coin-toss experiment will always yield some outcome:

**P**(**1** or** 2** or **3** or **4**) = P(1) + P(2) + P(3) + P(4)

= 0.2 + 0.45 + 0.15 + 0.2 = 1.0

X can be either **discrete** or **continuous**.

The range of a **discrete random variable **is countably infinite, for e.g. the set of integers. A real world example of a discrete ** X** is the number of cars passing through an intersection during some interval of time. The probability distribution of a discrete random variable is called the

**P**robability

**M**ass

**F**unction (PMF).

A **continuous random variable**’s range is the set of real numbers, i.e. an uncountably infinite set. Real world examples of a purely continuous ** X **are not easy to find. A close approximation is the temperature of a place at a specific time of the year, measured to some arbitrarily large precision. The probability distribution of a continuous random variable is called the

**P**robability

**D**ensity

**F**unction (PDF).

Real world examples of a purely continuous

are not easy to find.randomvariable

Now let’s go over what a random process is.

## Random (a.k.a. stochastic) process

A **random process** is a sequence of random variables ** X1, X2, X3,… etc**. usually indexed by time. Each variable can take on a different value from some probability distribution. See illustration below of a random process:

A random process can be either **discrete** or **continuous** depending on whether its member variables ** X1, X2, X3…etc **are discrete or continuous variables.

We are now ready to look at what sort of a creature is the Poisson process.

## The Poisson process

The Poisson process can be used to model the number of occurrences of events, such as patient arrivals at the ER, during a certain period of time, such as 24 hours, *assuming that one knows the average occurrence of those events over some period of time.* For example, an average of 10 patients walk into the ER per hour.

The Poisson process has the following properties:

**It is made up of a sequence of random variables**such that each variable represents the number of occurrences of some event, such as patients walking into an ER, during some interval of time.*X1, X2, X3, …Xk***It is a stochastic process**. Each time you run the Poisson process, it will produce a different sequence of random outcomes as per some probability distribution which we will soon see.**It is a discrete process**. The Poisson process’s outcomes are*the number of occurrences*of some event in the specified period of time, which is undoubtedly an integer —i.e. a discrete number.**It has independent increments**. What this is means is that the number of events that the process predicts will occur in any given interval, is independent of the number in any other disjoint interval. For e.g. the number of people walking into the ER from time zero (start of the observation) up through 10am, is independent of the number walking in from 3:33pm to 8:26pm, or from 11:00pm to 11:05pm and so on.- The Poisson process’s constituent variables
*X1, X2, X3,…Xk***identical distribution**. - The Poisson process’s constituent variables
all have a*X1, X2, X3,…Xk***Poisson distribution**, which is given by the**P**robability**M**ass**F**unction:

The above formula gives us the probability of occurrence of ** k** events

*in unit time*, given that the average occurrence rate is

**λ**events

*per unit time*.

The following 4 plots show the shape of the PMF for different values of **λ:**

In each plot, you can see that the probability peaks at the corresponding value of **λ,** and it tapers off on both sides of this value.

In each plot, the sum of probabilities for all** **possible values of

**is always 1.0, i.e. it’s a certainty that one of the outcomes will materialize.**

*k*Let’s take a closer look at the situation when **λ **= 5. In our example, this corresponds to five patient arrivals per hour. The probability of 0,1, 2, 3, … , 10, 11, … etc. patients walking into the ER *in one hour *looks like this:

As you can see the probability peaks at k = 5.

To know the likelihood of ** k patients **walking into the ER

**in**, we model it as a Poisson process with a rate (

*t*hours**λ**The corresponding formula for the

*t).***PMF for k occurrences in time t**looks like this:

The following set of probability distributions have all been generated using the above Poisson distribution formula by scaling the rate **λ **by a different time interval ** t**:

## Modeling inter-arrival times

The Poisson process has a remarkable substructure. Even though the number of occurrence of events is modeled using a discrete Poisson distribution, the interval of time between consecutive events can be modeled using the Exponential distribution,which is a continuous distribution.

Let’s explore this further.

Let ** X1, X2, X3,…etc**. be random variables such that:

** X1** = the interval of time between the start of the process and the 1st event, i.e. 1st

*arrival*,

**= the inter-arrival time between the first and the second arrival,**

*X2***= the inter-arrival time between the second and the third arrival,**

*X3*and so forth.

The distribution of random variable ** Xk** which represents the inter-arrival time between the

*(k-1)th*and

*(k)th*arrival is:

The **P**robability **D**ensity **F**unction (PDF) of the random variable ** Xk** is as follows:

And the **C**umulative **D**istribution **F**unction (CDF) is:

Recollect that CDF of ** X** returns the probability that the interval of time between consecutive arrivals will be less than or equal to some value

**.**

*t*## Simulating inter-arrival times in a Poisson process

We now have enough information to generate inter-arrival times in a Poisson process. We do this by using the Inverse-CDF technique, in which we literally construct the inverse function of the CDF, and feed it different probability values from a *Uniform**(0,1)* distribution. This gives us the corresponding inter-arrival times for the respective probabilities.

The inverse function of the CDF of the inter-arrival times is:

As mentioned earlier, we feed into this function, probability values from the continuous uniform distribution *Uniform**(0,1)*. We’ll soon see how to do this operation programmatically using a couple of lines of Python code.

For now, following is the table of patient inter-arrival times in hours at the ER for the first 10 patients. We have generated this date using the above formula, with **λ **set to 5 patients per hour.

Here is the plot of interval-times for the first 500 arrivals. As expected, it is the inverse of the graph of the CDF.

## Modeling *the arrival *times in a Poisson process

Now that we know how to generate inter-arrival times, it is easy to generate the patient arrival times.

From the table of 10 sample inter-arrival times shown above, we can deduce the following:

**Arrival time** of first patient = ** x1** =

**0.431257556**

**Arrival time** of second patient =

** x1** + inter-arrival time between first and second patient =

**+**

*x1***= 0.431257556 + 0.264141966 =**

*x2***0.6954**

**Arrival time** of third patient = ** x1** +

**+**

*x2***= 0.431257556 + 0.264141966 + 0.190045932 =**

*x3***0.885445**

… and so on

Keeping in mind that ** X1, X2, X3,…Xk** are the inter-arrival times, if we define

**as the variables that will represent the patient**

*T1, T2, T3, …Tk***arrival times**at the ER, we see that:

*T1 = X1T2 = X1 + X2T3 = X1 + X2 + X3…Tk = X1 + X2 + X3 + … + Xk*

Notice that since** T1, T2, T3…Tk **are defined as

*linear combinations of random variables*

**, the variables**

*X1, X2, X3,…Xk***are also random variables.**

*T1, T2, T3,…Tk*Here is one more very interesting fact:

Since ** T1, T2, T3…Tk **are each the sum of exponentially distributed random variables

**, the random variables**

*X1, X2, X3,…Xk***.**

*T1, T2, T3,…,Tk follow the Gamma distribution*The arrival times in a Poisson process follow the Gamma distribution which is a continuous distribution.

Let’s take a step back and note how smoothly we traveled from a discrete distribution to a set of continuous distributions! Such is the magical structure of the Poisson process. While the process itself is discrete, its substructure is represented entirely by continuous random variables.

The following schematic summarizes the three main distributions that comprise the Poisson process:

## Simulating a Poisson process

We are now ready to simulate the entire Poisson process.

To do so, we need to follow this simple 2-step procedure:

- For the given average incidence rate
**λ**, use the inverse-CDF technique to generate inter-arrival times. - Generate actual arrival times by constructing a running-sum of the interval arrival times.

Here is the Python code to simulate a Poisson process:

And here is the output of this program, giving us a completely simulated but 100% genuine Poisson sequence:

If one rounds up the arrival times to the nearest hour and rotates the plot by 90 degrees, one can spot the average arrival rate of 5 per hour showing through.

Here is the complete source code for generating:

- Times between consecutive events in a simulated Poisson process.
- Absolute times of consecutive events in a simulated Poisson process.
- Number of events occurring in consecutive intervals in a simulated Poisson process.
- Calculate the mean number of events per unit time.

**PREVIOUS:** An In-depth Study of Conditional Variance and Conditional Covariance

**NEXT: **A Deeper Dive Into The Poisson Probability Distribution Function