The Geometry of Statistical Data and Information: A Large Deviation Perspective
For researchers in information geometry and probability theory, this work provides a new geometric perspective on statistical data and connects two foundational concepts, though it is largely theoretical and incremental.
This paper uses large deviation theory to study the geometry of data spaces (empirical mean values) rather than probability distributions. It shows that the Fisher-Rao metric makes the space of empirical singleton frequencies spherical under i.i.d., but this breaks down for pairwise statistics, and identifies information projection in information geometry with that in Kolmogorov's probability theory.
The manifold of empirical mean values of statistical data ad infinitum has a geometric shape that depends on the probability measure that governs the generating model. Large deviation theory produces entropy functions that depend on both the probability measure and the statistical data; we use entropy to study the geometry of the data space rather than that of the space of probability distributions. It is well known, since Rao's work, that the Fisher-Rao metric makes the probability simplex into a sphere. From our perspective, that result translates to the space of empirical singleton counting frequencies under an i.i.d. assumption. Following our ideas and going beyond i.i.d., the choice of measure curves the space. When we study the pairwise statistics, the spherical geometry breaks down entirely. We show that the information projection, defined in information geometry as divergence minimization, coincides with the information projection in Kolmogorov's probability theory. This identification holds under both i.i.d. and Markovian assumptions and connects information geometry to the foundations of probability theory.