MLLGSTSep 28, 2020

Local Minima Structures in Gaussian Mixture Models

arXiv:2009.13040v39 citations
AI Analysis

This work addresses the non-convex optimization challenges in Gaussian Mixture Models for researchers in statistics and machine learning, offering theoretical insights into local minima structures, but it is incremental as it builds on existing landscape analyses.

The paper investigates the local minima of the negative log-likelihood function in Gaussian Mixture Models, revealing that all local minima share a common structure that partially identifies true cluster centers, even with over- or under-specified components, and provides sharper error bounds for one-dimensional cases with three components.

We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of components in the population limit. As the objective function is non-convex, there can be multiple local minima that are not globally optimal, even for well-separated mixture models. Our study reveals that all local minima share a common structure that partially identifies the cluster centers (i.e., means of the Gaussian components) of the true location mixture. Specifically, each local minimum can be represented as a non-overlapping combination of two types of sub-configurations: fitting a single mean estimate to multiple Gaussian components or fitting multiple estimates to a single true component. These results apply to settings where the true mixture components satisfy a certain separation condition, and are valid even when the number of components is over- or under-specified. We also present a more fine-grained analysis for the setting of one-dimensional GMMs with three components, which provide sharper approximation error bounds with improved dependence on the separation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes