The Origin of Statistics

Christiaan Cakici
29 april 2019

A column by Christiaan Cakici

Statistics is a subset of mathematics dealing with data collection, analysis and presentation. This short story aims to provide the reader with background information about the origin of statistics.

Probability theory deals with predicting the probability of future event, while statistics involves the analysis of the frequency of past events. For example, a probabilist assumes every face on a dice has a probability of 0,1667 to land face up. He or she then uses this information to predict upcoming results. A statistician does not assume the dice is fair but instead uses past results to draw conclusions about the probability of a face landing face up. It is generally accepted that Kolmogorov laid the foundation of probability theory. He published his paper Grundbegriffe der  ahrscheinlichkeitsrechnung (1933) in which he introduced the axioms of probability theory. These axioms are agreed upon by both frequentists and Bayesians. Statistics, however, is not considered to be invented by one person or entity, but something that evolved throughout the years. Statistics consists of two main branches, descriptive statistics and inferential statistics. Descriptive statistics is concerned with summarizing and describing data and inferential statistics is the process of drawing conclusions based on data. The word statistics is derived from the Latin word “status” and it means “political state” or "government."

Centuries ago, the word statistics was used to refer to kings needing information about land, agriculture, population and their military. However, the interpretation of the word statistics has changed many times throughout history. In the 16th century, Girolamo Cardano calculates probabilities of different dice rolls. In the 18th century DeMoivre noticed that as the number of coin flips increased, the binomial pdf approached a very smooth curve. DeMoivre reasoned that if he could find a mathematical expression for this curve, he would be able to solve problems such as finding the probability of 60 or more heads out of 100 coin flips easily. This is exactly what he did, and the curve he discovered is now called the "normal” curve. Later in 1778, LaPlace formulated the central limit theorem. In 1808, the mathematicians Adrain and Gauss independently developed the formula for the normal distribution and showed that many natural phenomena abide to the normal curve.

Timeline of statistical concepts
1654 – Pascal and Fermat create the mathematical theory of probability
1657 – Huygens writes the first book on mathematical probability
1662 – Graunt creates mortality tables
1666 – Graunt calculates the life expectancy
1693 – Halley prepares the first mortality tables statistically relating death rate to age
1713 – First mention of the law of large numbers
1724 – De Moivre studies mortality statistics and invents life annuities
1733 – De Moivre introduces the normal distribution

Abraham de Moivre

1761 – Bayes proves the Bayes' theorem
1786 – Playfair's Commercial and Political Atlas introduces graphs and bar charts of data,
1814 – Laplace defends a definition of probabilities in terms of equally possible cases, introduces generating functions and Laplace transforms, discusses limiting distribution and the importance of the asymptotic Fisher information matrix
1866 – Venn defends the frequency interpretation of probability
1880 – Thiele introduces the likelihood function and invents cumulants
1886 – Galton invents regression
1888 – Galton introduces the concept of correlation
1900 – Bachelier analyzes stock price movements as a stochastic process
1908 – Student's t-distribution was introduced
1920 – Fisher invents maximum likelihood
1921 – Wright invents R-squared
1930 – Cross invents the method of moments
1933 – Kolmogorov introduces the axioms of probability
1937 – Neyman introduces the concept of confidence interval in statistical testing

Confidence interval for a normal distribution

1946 – Cox's theorem derives the axioms of probability from simple logical assumptions
1964 – Box & Cox invent the Box-Cox transformation
1982 – Engle invents ARCH models
1982 – Hansen generalized the method of moments and introduces GMM
1986 – Bollerslev generalized ARCH models and introduces the GARCH model