Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM
This work provides theoretical guarantees for EM algorithm convergence in mixture models, which is incremental as it builds on existing analysis but extends applicability to broader initialization conditions.
The paper tackles the problem of analyzing the statistical accuracy of the EM algorithm for estimating centers in a two-component Gaussian mixture, extending prior results by significantly expanding the basin of attraction to a half-space and ball intersection and showing feasibility with random initialization under a signal-to-noise ratio condition of at least a constant times sqrt(d log d).
Recently, a general method for analyzing the statistical accuracy of the EM algorithm has been developed and applied to some simple latent variable models [Balakrishnan et al. 2016]. In that method, the basin of attraction for valid initialization is required to be a ball around the truth. Using Stein's Lemma, we extend these results in the case of estimating the centers of a two-component Gaussian mixture in $d$ dimensions. In particular, we significantly expand the basin of attraction to be the intersection of a half space and a ball around the origin. If the signal-to-noise ratio is at least a constant multiple of $ \sqrt{d\log d} $, we show that a random initialization strategy is feasible.