Generalizing Information to the Evolution of Rational Belief
This foundational work provides a unified framework for information theory that impacts all of ML/AI by enabling better quantification and analysis of belief evolution in Bayesian and machine learning contexts.
The authors derived a general theory of information from first principles that measures change in belief, recovering existing measures like entropy and Kullback-Leibler divergence, and applied it to machine learning tasks such as quantifying information in predictive models and feature selection.
Information theory provides a mathematical foundation to measure uncertainty in belief. Belief is represented by a probability distribution that captures our understanding of an outcome's plausibility. Information measures based on Shannon's concept of entropy include realization information, Kullback-Leibler divergence, Lindley's information in experiment, cross entropy, and mutual information. We derive a general theory of information from first principles that accounts for evolving belief and recovers all of these measures. Rather than simply gauging uncertainty, information is understood in this theory to measure change in belief. We may then regard entropy as the information we expect to gain upon realization of a discrete latent random variable. This theory of information is compatible with the Bayesian paradigm in which rational belief is updated as evidence becomes available. Furthermore, this theory admits novel measures of information with well-defined properties, which we explore in both analysis and experiment. This view of information illuminates the study of machine learning by allowing us to quantify information captured by a predictive model and distinguish it from residual information contained in training data. We gain related insights regarding feature selection, anomaly detection, and novel Bayesian approaches.