DS IT OC MLNov 6, 2017

Maximum Entropy Distributions: Bit Complexity and Stability

arXiv:1711.02036v25.910 citations

Originality Incremental advance

AI Analysis

This provides robust and computationally feasible models for data in machine learning, statistics, and related fields, addressing foundational questions with incremental theoretical advances.

The paper tackles the problem of whether maximum entropy distributions over large discrete supports have succinct descriptions and are stable under marginal perturbations, showing that ε-optimal dual solutions have poly(m, log 1/ε) bit complexity and that marginal changes affect the distribution by poly(m, log 1/δ)√δ in total variation distance.

Maximum entropy distributions with discrete support in $m$ dimensions arise in machine learning, statistics, information theory, and theoretical computer science. While structural and computational properties of max-entropy distributions have been extensively studied, basic questions such as: Do max-entropy distributions over a large support (e.g., $2^m$) with a specified marginal vector have succinct descriptions (polynomial-size in the input description)? and: Are entropy maximizing distributions "stable" under the perturbation of the marginal vector? have resisted a rigorous resolution. Here we show that these questions are related and resolve both of them. Our main result shows a ${\rm poly}(m, \log 1/\varepsilon)$ bound on the bit complexity of $\varepsilon$-optimal dual solutions to the maximum entropy convex program -- for very general support sets and with no restriction on the marginal vector. Applications of this result include polynomial time algorithms to compute max-entropy distributions over several new and old polytopes for any marginal vector in a unified manner, a polynomial time algorithm to compute the Brascamp-Lieb constant in the rank-1 case. The proof of this result allows us to show that changing the marginal vector by $δ$ changes the max-entropy distribution in the total variation distance roughly by a factor of ${\rm poly}(m, \log 1/δ)\sqrtδ$ -- even when the size of the support set is exponential. Together, our results put max-entropy distributions on a mathematically sound footing -- these distributions are robust and computationally feasible models for data.

View on arXiv PDF

Similar