ML LGMar 28, 2016

Estimating Mixture Models via Mixtures of Polynomials

Sida I. Wang, Arun Tejasvi Chaganty, Percy Liang

arXiv:1603.08482v11.3Has Code

Originality Highly original

AI Analysis

This work addresses the lack of global convergence guarantees in EM for mixture models, offering a novel framework that could benefit researchers and practitioners in statistics and machine learning, though it is incremental in extending method of moments to more models.

The authors tackled the problem of estimating mixture models with global convergence guarantees by introducing Polymom, a unifying framework based on method of moments that applies when component moments are polynomial in parameters, resulting in a method that is easily derivable and applicable to a wide range of models.

Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. However, the likelihood of a mixture model is non-convex, so EM has no known global convergence guarantees. Recently, method of moments approaches offer global guarantees for some mixture models, but they do not extend easily to the range of mixture models that exist. In this work, we present Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM. Polymom is applicable when the moments of a single mixture component are polynomials of the parameters. Our key observation is that the moments of the mixture model are a mixture of these polynomials, which allows us to cast estimation as a Generalized Moment Problem. We solve its relaxations using semidefinite optimization, and then extract parameters using ideas from computer algebra. This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation.

View on arXiv PDF Code

Similar