Information Theory and Statistical Learning
For students and researchers in information theory and machine learning, this chapter serves as an accessible educational resource, but it is not novel research.
This chapter from the forthcoming third edition of Cover and Thomas's Elements of Information Theory provides a concise treatment of the intersection of information theory and statistical learning, focusing on divergence measures in model training. It covers examples from linear regression to generative diffusion models, offering a more systematic derivation of the latter.
This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.