LGMLMar 20, 2018

Monte Carlo Information Geometry: The dually flat case

arXiv:1803.07225v121 citations
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for researchers and practitioners in machine learning and statistics who rely on information-geometric methods for probability distribution families.

The paper tackles the problem of applying information-geometric algorithms to exponential and mixture families when the Bregman generators are intractable due to integral calculus issues, by introducing Monte Carlo estimation to approximate these generators. The result is a series of dually flat geometries called Monte Carlo Information Geometries, which enable practical use of Bregman algorithms, demonstrated in a clustering task.

Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback-Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes