Global analysis of Expectation Maximization for mixtures of two Gaussians
This addresses the problem of EM's convergence guarantees for statisticians and machine learning practitioners, but it is incremental as it focuses on a specific model.
The paper tackles the disconnect between EM's statistical principles and algorithmic properties by providing a global analysis of EM for mixtures of two Gaussians, characterizing limit points in the infinite sample limit and establishing statistical consistency.
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians. This is achieved by (i) studying the sequence of parameters from idealized execution of EM in the infinite sample limit, and fully characterizing the limit points of the sequence in terms of the initial parameters; and then (ii) based on this convergence analysis, establishing statistical consistency (or lack thereof) for the actual sequence of parameters produced by EM.