A deep dive into the meaning of the two bounds and into the remarkable series of happenings that led to their discovery
It’s not often that the universe tells you that something simply cannot be done. It doesn’t matter how smart you are or how lavishly bank-rolled you are or what corner of the universe you call your own. When the universe says “Not possible”, there are no two ways around it.
In the sciences, such impossibilities are often expressed as limits on the value of some quantity. A famous example is Albert Einstein’s 1905 discovery that when you let a photon loose in the vacuum of space, there is nothing —unquestionably nothing — that can overtake it. Hundreds of such limits or bounds have been discovered and proved. Taken together, these limits form a fence around reality.
Markov’s Inequality and Chebyshev’s Inequality (a.k.a. the Bienaymé–Chebyshev inequality) are two such bounds that have profoundly shaped our appreciation of the limits that nature places on how frequently random events can occur.
The discovery and proof of Markov’s inequality is credited to the brilliant and passionately principled Russian mathematician Andrey Andreyevich Markov (1856–1922).

The credit for the Bienaymé–Chebyshev inequality goes to two people: a giant in probability theory and Markov’s teacher — the redoubtable Pafnuty Lvovich Chebyshev (1821–1894), and to Chebyshev’s French colleague and friend Irénée-Jules Bienaymé (1796–1878).
There is such remarkable history attached to the discovery of these inequalities — especially the Bienaymé–Chebyshev inequality — that it will be dismally inadequate to simply tumble out the mathematics about these inequalities without so much as touching upon the characters and the stories that produced it. I’ll try to bring out these backstories. And in doing so, I’ll set the context for explaining the math underlying the inequalities.
I’ll begin with Markov’s inequality, then show how the Bienaymé–Chebyshev inequality arizes from doing a few simple variable substitutions in Markov’s inequality. For bonus pleasure, we’ll claim our grand prize — the proof of the Weak Law of Large Numbers (WLLN) — by showing how the WLLN emerges, almost effortlessly, from the Bienaymé–Chebyshev inequality.
What is the Markov’s inequality?
The name Markov brings to mind “Markov Chains”, “Markov Processes”, and “Markov Models”. Strictly speaking, the Markov chain is what A. A. Markov created. But Markov’s contributions to math went well beyond Markov chains and probability theory. A prolific researcher, Markov published over 120 papers covering a sweeping expanse of ideas in number theory, continuous fractions, calculus and statistics. Incidentally, Markov published mostly in Russian language journals in contrast to his doctoral advisor P. L. Chebyshev who published heavily in West European, especially French publications.
In the year 1900, a period during which Markov was quite likely at the peak of his career, he published a seminal book on probability titled “Ischislenie Veroiatnostei” (Calculus of Probabilities).

The book went through 4 editions and a German language edition. Markov published the 3rd edition of his book purposely in 1913 to commemorate the 200th anniversary of the Weak Law of Large Numbers (WLLN). A great deal of material in the 3rd edition is devoted to the WLLN. But tucked away in a lemma is Markov’s proof of a law that turned out to be so central to the field of statistical science that it is often used as the starting point for a proof of the WLLN itself.
Markov’s inequality is based upon five related concepts:
- A random variable X that takes non-negative values
- The mean (Expected Value) of X
- The observed value X
- The probability distribution of X
- A reference value ‘a’ in the range of X. Think of ‘a’ as the value of X that you wish to observe.
Markov’s inequality links together these five items into the following equation:

What Markov showed was the following:
Imagine any non-negative random variable X. X could represent something mundane like your morning wake up time. Or something immense like the number of stars in a galaxy. X can be discrete or continuous. X can have any kind of probability distribution. In short, X can represent any non-negatively valued random phenomenon. Now pick a value — any value — in the range of X that you wish to observe. Let’s denote this value as ‘a’.
Markov’s inequality says the following about X:
- Nature imposes an upper bound on the probability of observing any value of the random variable X that is greater or equal to the value ‘a’ that you wish to observe.
- Greater is the value (‘a’) that you wish to observe, lower is the (upper bound on the) probability of actually observing it, and vice versa, and vice versa.
- For a given value (‘a’) you wish to observe, greater is the mean of X a.k.a. E(X) , greater is the (upper bound on the) probability of actually observing it, and vice versa.
- These relationships between hold true regardless of how X is distributed.
We know that nature abhors outliers, and Markov proved it.
Some of this might sound like everyday common sense but Markov’s brilliance lies in establishing a mathematically precise relationship between ‘a’, P(X>=a), and E(X).
An illustration of Markov’s inequality
To illustrate, take a look at the following plot. It shows the frequency distribution of the per capita personal income of counties in the top-20 richest states of the United States.

Here, the random variable X is the per capita income of a randomly chosen county.
Now let’s work with some threshold ‘a’ for the per capita income. In each chart of following image panel, the red region represents the condition X ≥ a, where a = $50000, $70000, and $80000.

The probability P(X ≥ a) is the ratio of the area of the red region to the total area under the histogram. It’s easy to see that this probability P(X ≥ a) diminishes as ‘a’ grows. It’s inversely related to ‘a’.
The mean of this distribution is somewhere around its peak, namely around $50,000. Imagine for a second how the shape of this distribution will look if its mean were to shift to $75,000: A considerable amount of the mass of the distribution will, as it were, shift rightward. Consequently, the area of the red region corresponding to X ≥ a for any ‘a’ such as $80,000 will also be larger than before. And so will the corresponding probability P(X ≥ a). In effect, if the mean of X shifts ‘upward’ from $50,000 to $75,000, it will increase the (upper bound on the) probability of observing an X ≥ $80,000.
Proof of Markov’s inequality
There are a number of ways to prove Markov’s Inequality. I’ll describe a simple technique that works irrespective of whether X is discrete or continuous. X just needs to be non-negative.
Since X is non-negative, 0 ≤ X.
As before, let ‘a’ be some value of interest to you.
Therefore, there are two scenarios we need to address:
- X ≥ a
- 0 ≤ X < a
Let’s address the first scenario.
Scenario 1: X ≥ a
Let’s define a random variable I such that:
- I = 0 when 0 ≤ X < a, and
- I = 1 when X ≥ a.
In statistical parlance, I is called an indicator variable.
We are considering the case where X ≥ a. Multiplying both sides by I:
XI ≥ aI …(1)
As per the definition of I, when X ≥ a, I = 1. So in this case, XI = X.
Substituting XI = X in (1):
X ≥ aI when I = 1 …(2)
Let’s remember this result.
Now let’s consider the case where 0 ≤ X < a.
Scenario 2: 0 ≤ X < a
By definition of I, when X < a, I = 0.
Therefore, aI = a0 = 0
Since X is assumed to be non-negative i.e. X ≥ 0 and aI = 0, we can conclude that in this case also, X ≥ aI.
Thus, X ≥ aI when I = 0 …(3)
Combining results (2) and (3), we see that for all values of X, and both values of I:
aI ≤ X …(4)
Let’s apply the expectation operator E(.) on both sides of (4):
E(aI) ≤ E(X)
Taking the constant ‘a’ out:
aE(I) ≤ E(X)
Now let’s work upon E(I). The random variable I can take only two values: 0 and 1 corresponding to when X < a and X ≥ a. The probability associated with each event is P(X < a) and P(X ≥ a) respectively. So,
E(I) = 0P(X < a) + 1P(X ≥ a) = P(X ≥ a)
Substituting this result back in aE(I) ≤ E(X), we have:
aP(X ≥ a) ≤ E(X)
And thus:
P(X ≥ a) ≤ E(X)/a
Thus, Markov’s inequality is proved.
What is the Chebyshev’s Inequality (a.k.a. the Bienaymé–Chebyshev inequality)?
Notation-wise, the Bienaymé–Chebyshev inequality is expressed as follows:

In the above equation:
- X is any random variable of interest.
- E(X) is mean of X.
- The value ‘a’ can be considered as some non-negative value of our interest. We can think of ‘a’ as the value of X that we wish to observe or monitor.
- Var(X) is the variance of X -A measure of how dispersed is X around its mean.
Chebyshev’s inequality states that the probability of observing a value of a random variable X that is more than ‘a’ units apart from its mean is bounded at the top. This upper bound on the probability is inversely proportional to a2 and it’s directly proportional to how dispersed is X around its mean, in other words, to the variance of X.
In essence, Chebyshev’s inequality says the following two things about our universe:
- For any random variable, you are much less likely to observe values that lie far away from its mean than ones that lie close to its mean.
- However, for any given separation of interest, the more dispersed the random variable’s values are around its mean, the more likely you are to observe a value that lies beyond the separation of interest from its mean.
If the random variable X represents the mean of a randomly selected sample from a population of values, then applying Chebyshev’s inequality to the sample mean X yields the following intuitively appealing result:
When you collect a random sample of values from a population, it’s much more likely that your sample will be representative of the population, provided the population is not highly dispersed.
Just as with Markov’s inequality, the magnificence of the Bienaymé–Chebyshev inequality lies in its making no assumption whatsoever about the probability distribution of X. For example, X can be normally distributed, exponentially distributed or gamma distributed. X can be distributed in the shape of a cow’s shadow for all it cares. The Bienaymé–Chebyshev probability bound stays rock solid.
Additionally, and unlike Markov’s inequality, the Bienaymé–Chebyshev inequality works for both non-negative and negative-valued random variables.
A brief detour into the history of Chebyshev’s inequality
There is quite a lot of compelling history that feeds into the discovery of the Bienaymé–Chebyshev inequality. For starters, there is a reason Jules Bienaymé’s name ought to, and does, precede Chebyshev’s name in this inequality.
In 1853, the French mathematician Irénée-Jules Bienaymé published what came to be one of the most important papers in the proceedings of the French Academy of Sciences. Bienaymé’s paper was ostensibly about his treatment of the Laplace’s least squares method. However, as part of that work, he ended up stating and proving the Bienaymé–Chebyshev inequality (which at the time could only have been Bienaymé inequality as Chebyshev was nowhere in the picture).
But Bienaymé, true to his modest nature, and on account of his attention being wholly devoted to Laplacian least squares, failed to adequately call out the significance of his discovery and it went basically unnoticed. And so might have languished one of most important results in probability, had Pafnuty Lvovich Chebyshev not been born with an atrophied leg.
On an early summer’s day in 1821, when the 25-year-old Bienaymé was still finding his feet as a civil servant in France’s Ministry of Finances, Pafnuty Lvovich Chebyshev was born in a village 100 miles south of St. Petersburg in imperial Russia. Chebyshev was one of nine children and from an early age, he showed exceptional aptitude in both mechanics and mathematics.
Chebyshev’s father was an army officer who had fought off Napoleon in the latter’s deeply disastrous (for Napoleon obviously) advance on Russia in 1812. In one of those curiously amusing ironies of history, just two years later in the messy aftermath of Napoleon’s retreat, Jules Bienaymé would help Napoleon fight off Russian, Austrian and Prussian forces advancing into Paris. Napoleon of course thoroughly failed to protect Paris and instead got himself exiled to Elba.

All of this history was to play out well before Pafnuty Lvovich’s birth. But given his military lineage and family tradition, had it not been for his congenitally atrophied leg, P. L. Chebyshev may very well have followed some of his siblings into the Tsar’s military, and the history of probability would have taken an altogether different turn. But Chebyshev’s initiation into mathematics and later into Russian academia weren’t the only catalysts for his introduction to Bienaymé. And for that matter, also for his championing of the latter’s contributions to the Bienaymé–Chebyshev inequality.
As a child, Chebyshev was home-schooled in French. Early in his career, he appears to have realized if he wanted his work to be read outside his native land, he must become known in the 19th century’s global capital for mathematical research, which was Paris.
At every opportunity, Chebyshev traveled to France and other Western European capitals and he published almost half of his 80 papers in Western European journals. Many of them appeared in the Journal des Mathématiques Pures et Appliquées (Journal of Pure and Applied Mathematics) edited by the French mathematician Joseph Liouville. It was during his 1852 European tour that Chebyshev was introduced to Bienaymé, a mutually beneficial friendship that yielded Chebyshev access to many European scientists and publishers, and which later gave Bienaymé’s own work in mathematics the publicity it deserved in major French and Russian journals.
A pivotal work, of course, was Bienaymé’s discovery in 1853 of the inequality which bears his name. Which brings us back to the study of this inequality.
Bienaymé’s 1853 discovery
What Bienaymé actually proved in his 1853 paper was the following.
Suppose you were to draw a random sample of size N from a population of values whose mean and variance are respectively μ and σ². Let X_bar be the mean of your random sample. Incidentally, it can be shown that the sample mean X_bar is itself a random variable with its own mean and variance being respectively μ and σ²/N. What Bienaymé showed was the following:

Notice how astonishingly similar is the structure of Bienaymé’s result to Chebyshev’s inequality. Especially given that Bienaymé proved it a decade and a half before Chebyshev. Specifically, notice the following similarities with Chebyshev’s inequality:
- As with Chebyshev’s inequality, Bienaymé’s result concerns itself with how far away from the mean (in Bienaymé’s result, this mean is the population mean μ) is the observed value of a random variable (in Bienaymé’s result, this random variable is the sample mean X_bar).
- Similar to Chebyshev’s inequality, Bienaymé’s result places an upper bound on the probability that the distance between the two means is greater than a certain threshold (in Bienaymé’s result, this threshold is k2σ²).
- Finally, Bienaymé’s result states that this upper bound is inversely proportional to a certain quantity that is a function of the sample size N.
In its essence, Bienaymé’s result is Chebyshev’s inequality applied to a special random variable. The special random variable being the sample mean!
You may be wondering exactly when does Chebyshev enter into the merry orbit of Bienaymé‘s discoveries in a way that leads to Chebyshev’s name being attached to this inequality.
It so happened that fourteen years after Bienaymé published his above result, Chebyshev, completely oblivious of Bienaymé’s discovery, published a different version of this inequality in the 1867 issue of Joseph Liouville’s journal. Bear in mind, this was the pre-JSTOR, pre-CiteSeer, pre-Google Scholar, pre-telephone era. To say that scientists of the time weren’t fully aware of “previous work” barely hinted at the scale of the problem.
At any rate, Chebyshev’s version of the inequality applies to the mean X_bar of N independent but not necessarily identically distributed random variables X1, X2, X3, …, XN. Joseph Liouville, who seemed to be fully aware of Bienaymé’s work, astutely republished Bienaymé’s 1853 paper in the 1867 issue of his journal, positioning it just before Chebyshev’s paper (see image below)!

To his credit, Chebyshev, in a paper he published in 1874 graciously attributed the discovery of this inequality entirely to Bienaymé in the following words:
The simple and rigorous demonstration of Bernoulli’s law to be found in my note entitled: Des valeurs moyennes, is only one of the results easily deduced from the method of M. Bienaymé, which led him, himself, to demonstrate a theorem on probabilities, from which Bernoulli’s law follows immediately
PAFNUTY Lvovich Chebyshev
In the years that followed, what became known as Chebyshev’s inequality (or more accurately, the Bienaymé — Chebyshev inequality) is a version that applies simply to any random variable X having an expected value E(X) and a finite variance Var(X).
As we saw earlier, Chebyshev’s inequality, or more appropriately, the Bienaymé — Chebyshev inequality states that for any positive ‘a’, the probability P(|X — E(X| ≥ a) is bounded as follows:

Proof of Chebyshev’s inequality
In a curious case of approach, mathematicians often use Markov’s inequality that he proved in the 1913 edition of his book “Ischislenie Veroiatnostei” (Calculus of Probabilities) to prove his mentor Chebyshev’s inequality which was discovered several decades earlier. In fact, using Markov’s inequality as a starting point, it’s a walk in the park to prove Chebyshev’s inequality.
Recall Markov’s inequality for the random variable X with mean E(X):

Now, let’s define another random variable Z = (X — E(X))². The square term ensures that Z is non-negative allowing us to apply Markov’s inequality to Z. Let’s assume a threshold value of Z which we’ll call a². The probability of an observed value of Z meeting or exceeding a² is P(Z ≥ a²). Applying Markov’s inequality to Z and a² yields the following:

Starting with the above expression, we can derive the Chebyshev’s inequality as follows:

Equation (3) is the Bienaymé-Chebyshev inequality (or simply Chebyshev’s inequality).
Instead of working with an arbitrary threshold ‘a’, it’s useful to express ‘a’ in terms of the standard deviation σ of X as follows:

Equation (4) is one of the most results of Statistical science. It says the following:
The probability of observing a value of a random variable that is more than k standard deviations greater (or smaller) than its mean is bounded at the top and this bound is inversely proportional to the square of k.
The above proof also opens a direct path to the proof of Bienaymé’s original result shown in his 1853 publication, namely the following:

Starting with equation (2), and substituting X with the sample mean X_bar and a² with k²σ², we arrive at Bienaymé’s circa 1853 result as follows:

Equation (4a) is the result that Bienaymé proved in 1853 and its implication is the same as that of Eq. (4), namely:
One is strikingly unlikely to run into values that are several standard deviations away from the mean value.
When expressed this way, the Bienaymé–Chebyshev inequality puts a mathematical face on aphorisms such as “If it sounds too good to be true, it probably is”, or, on the all-time favorite of scientists: “Extraordinary claims require extraordinary evidence”.
To illustrate the operation of this inequality, consider the following data set of average daily temperature recorded in the Chicago area on January 1 of each year. There are 100 observations spanning from 1924 to 2023:

The black dotted horizontal line toward the center of the plot depicts the sample mean of 24.98 F. The colored horizontal lines depict temperature values at plus/minus 1.25, 1.5, 1.75 and 2 times the standard deviation of the data sample. These standard deviation lines give us a feel for the bounds within which most of the temperatures are likely to lie.
In the above data, let X represents the observed average temperature on January 1 in a randomly selected year.
The mean of X i.e. E(X) is 24.98 F,
The standard deviation σ = 10.67682 F (technically, we are dealing with a sample of values, so we should really be calling this the same standard deviation ‘s’).
Using the Equation (4) format of the Bienaymé–Chebyshev inequality to this temperature data, we can determine the upper bound on the probability P(|X — E(X)| ≥ kσ), for different values of k = 1, 1.25, 1.5, 1.75 and 2.0.
The following table mentions these probability bounds in the 1/k² column:

The last column of the table shows the actual probabilities of observing such deviations in the data sample. The actual observed values in the data sample are comfortably within the probability bounds generated by the Bienaymé–Chebyshev inequality.
You may have noticed that the probability bounds generated by the Bienaymé–Chebyshev inequality are quite wide. For example, when k=1 (which corresponds to an event lying within 1 standard deviation of the mean), the inequality calculates the upper bound on the probability as 1/1² = 1.0, or 100%. This renders this specific bound practically useless.
Nevertheless, for all values of k > 1, the inequality is quite useful. Its usefulness also lies in its not assuming any particular shape for the distribution of the random variable. In fact, it goes even further in its applicability. While Markov’s inequality requires the random phenomena to produce strictly non-negative observations, if you notice, the Bienaymé–Chebyshev inequality makes no such assumptions about X.
The Bienaymé–Chebyshev inequality also gives us a drop-dead simple proof for the Weak Law of Large Numbers. In fact, in 1913, Markov used this inequality to demonstrate the proof of the WLLN, and it is essentially the same proof used by many textbooks today. In the next section, we’ll go over the WLLN and its proof. A much more expansive treatment of the WLLN and the riveting history behind its discovery is covered in a series of chapters later on in this eBook.
The Weak Law of Large Numbers (and its proof)
Suppose you gather a random sample from a theoretically infinitely large population. Let the sample size be N. This random sample has a sample mean X_bar. Since you are dealing with just a sample and not the entire population, your sample mean is likely to be some distance away from the true population mean μ. This is the error in your sample mean. Suppose you express the absolute value of this error as |X_bar — μ|.
The Weak Law of Large Numbers says that for any positive tolerance ϵ of your choosing, the probability of the error in your sample mean being greater than ϵ will shrink to zero as the sample size N grows to infinity. It doesn’t matter how small a tolerance ϵ you choose.
P(|X_bar — μ| >= ϵ) will approach zero as the sample size N approaches infinity.

The WLLN has a rich history of discovery that runs back more than three centuries with the who’s who of mathematics— starting with Jacob Bernouli in 1713 and including such giants as De Moivre, Laplace, Lacroix, Poisson, and our friends Bienaymé and Chebyshev — contributing to its development. And thanks to the Bienaymé–Chebyshev inequality, the proof of the WLLN flows with the easiness of water running down a hillside.
Proof of the Weak Law of Large Numbers
As with so many things in statistics, we begin the proof by drawing a random sample of size N from a population. Let’s denote this sample as X1, X2, X3, …, XN. It’s useful to think of X1, X2, X3, …, XN as a set of N variables, like a set of N slots, each of which gets filled with a randomly selected value from the population whenever a sample is drawn. Thus X1, X2, X3, …, XN are themselves random variables. Furthermore, since each of X1, X2, X3, …, XN acquire a random value independent of the others, and all values comes from the sample population, they are independent, identically distributed (i.i.d.) random variables.
For any given randomly selected sample, the sample mean X_bar can be calculated as follows:

Since drawing another random sample will yield a different sample mean and drawing a third sample will yield yet another sample mean and so on, the sample mean X_bar is itself is a random variable with it’s own mean and variance. What will be this mean and variance?
Let’s calculate the mean of X_bar.

Thus the mean of the sample mean is the population mean μ.
Now let’s calculate the variance of the sample mean.

The variance of the sample mean is the population variance σ2/N.
To proceed further in our proof of the WLLN, we will apply Chebyshev’s inequality to the sample mean X_bar. Recall what Chebyshev’s inequality says about a random variable X:

In the above inequality, we make the following substitutions:
- We substitute X with the sample mean X_bar
- Thus E(X) is the expected value of the sample mean which we know is μ.
- Var(X) is the variance of the sample mean which we know is σ2/N.
- We also substitute ‘a’ with the error ‘ϵ’.
In the following panel, you can see how these substitutions are made the result equations worked out until we arrive at the statement for the Weak Law of Large Numbers.

Equation (5) is of course the statement of the WLLN.
How something so profound, so incredibly far-reaching, and so central to the field of statistical science as the WLLN, can have such a disarmingly simple proof is one of those absurdities of nature that one can only marvel at. At any rate, there you have it.
Summary
For an non-negative random variable X, Markov’s inequality states that the probability of observing a value of X greater than some threshold ‘a’ is bounded at the top, and the upper bound is inversely proportional to ‘a’, and it is directly proportional to mean of X.
Chebyshev’s inequality states that the probability of observing a value of a random variable X that is more than ‘a’ units apart from its mean is bounded at the top. This upper bound on the probability is inversely proportional to a2 and it’s directly proportional to how dispersed is X around its mean, in other words, to the variance of X.
An important implication of Chebyshev’s inequality (a.k.a. the Bienaymé–Chebyshev inequality), is that the probability of observing a value of a random variable that is more than k standard deviations greater (or smaller) than its mean is bounded at the top and this bound is inversely proportional to the square of k.
Another important consequence of Chebyshev’s inequality is that, when you collect a random sample of values from a population it’s much more likely that your sample will be representative of the population than not, provided the population is not highly dispersed.
Chebyshev’s inequality can be proved using Markov’s inequality.
The Weak Law of Large Numbers says that for any positive tolerance ϵ of your choosing, the probability of the error in your sample mean being greater than ϵ will shrink to zero as the sample size N grows to infinity. It doesn’t matter how small a tolerance ϵ you choose.
The Weak Law of Large Numbers can be proved using Chebyshev’s inequality.
Taken together, Markov’s inequality, Chebyshev’s inequality, and the Weak Law of Large Numbers form the bedrock on which a large pile of statistical science securely rests. The entire field of regression analysis, which includes both linear and highly nonlinear models (such as neural net models) draws its validity from the Weak Law of Large Numbers.
When you train a statistical model (or a neural net model), the estimation technique had better obey the WLLN. If it doesn’t, the estimator isn’t consistent and the coefficient estimates aren’t guaranteed to converge to the true population values. And that makes your training technique at best biased and at worst, well, essentially useless.
The WLLN also finds gainful employment in the proof of another epic result — the Central Limit Theorem. And that, deservedly, will be the substance of a subsequent chapter.
References and Copyrights
Papers
Bienaymé, I.J. (1853) Considérations à l’appui de la découverte de Laplace sur la loi de probabilité dans la méthode des moindres carrés,” (“Considerations in support of Laplace’s discovery on the law of probability in the method of least squares.”) C.R. Acad. Sci., Paris 37 309–324. Also published in “Journal de Mathématiques Pures et Appliquées” (“Journal of Pure and Applied Mathematics”), Liouville, (2) 12 158–176. (1867)
Gely P. Basharin, Amy N. Langville, Valeriy A. Naumov, “The life and work of A.A. Markov”, Linear Algebra and its Applications, Volume 386, 2004, Pages 3–26, ISSN 0024–3795, https://doi.org/10.1016/j.laa.2003.12.041.
Bru, Bernard, François Jongmans, and Eugene Seneta. “I.J. Bienaymé: Family Information and Proof of the Criticality Theorem.” International Statistical Review / Revue Internationale de Statistique 60, no. 2 (1992): 177–83. https://doi.org/10.2307/1403648.
Eugene Seneta “A Tricentenary history of the Law of Large Numbers,” Bernoulli, Bernoulli 19(4), 1088–1121, (September 2013)
Data sets
U.S. Bureau of Economic Analysis “Personal Income by County, Metro, and Other Areas” under public domain license.
National Weather Service “NOAA Online Weather Data” for Chicago Area under public domain license.
Images
All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.
NEXT: The Weak Law of Large Numbers

