LG STJan 3, 2021

Improved Convergence Guarantees for Learning Gaussian Mixture Models by EM and Gradient EM

arXiv:2101.00575v29.916 citations

Originality Highly original

AI Analysis

This work provides improved theoretical convergence guarantees and sample complexity bounds for EM and gradient EM algorithms, which is significant for researchers and practitioners using Gaussian Mixture Models, especially in scenarios requiring robust initialization or efficient data usage.

This paper addresses the problem of estimating parameters for Gaussian Mixture Models with K components, known weights, and identity covariance matrices. It provides a sharper analysis of local convergence for EM and gradient EM, showing convergence to global optima from a larger initialization region (almost half the distance to the nearest Gaussian) under a separation of Ω(√log K). Additionally, the study demonstrates that sample size requirements and error estimates for accurate estimation depend only logarithmically, rather than quadratically or linearly, on the maximal separation between components.

We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights, all with an identity covariance matrix. We make two contributions. First, at the population level, we present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works. Assuming a separation of $Ω(\sqrt{\log K})$, we prove convergence of both methods to the global optima from an initialization region larger than those of previous works. Specifically, the initial guess of each component can be as far as (almost) half its distance to the nearest Gaussian. This is essentially the largest possible contraction region. Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM. In previous works, the required number of samples had a quadratic dependence on the maximal separation between the K components, and the resulting error estimate increased linearly with this maximal separation. In this manuscript we show that both quantities depend only logarithmically on the maximal separation.

View on arXiv PDF

Similar