DSLGNov 26, 2019

Robustly Clustering a Mixture of Gaussians

arXiv:1911.11838v66 citations
Originality Highly original
AI Analysis

This addresses a central problem in robust estimation for machine learning and statistics, offering a solution with theoretical guarantees for clustering mixtures under separation conditions.

The paper tackles the problem of robustly clustering a mixture of two arbitrary Gaussians, an open challenge in robust estimation, by providing an efficient algorithm that works under well-separated means or covariances, with separation requirements close to the minimal possible.

We give an efficient algorithm for robustly clustering of a mixture of two arbitrary Gaussians, a central open problem in the theory of computationally efficient robust estimation, assuming only that the the means of the component Gaussians are well-separated or their covariances are well-separated. Our algorithm and analysis extend naturally to robustly clustering mixtures of well-separated strongly logconcave distributions. The mean separation required is close to the smallest possible to guarantee that most of the measure of each component can be separated by some hyperplane (for covariances, it is the same condition in the second degree polynomial kernel). We also show that for Gaussian mixtures, separation in total variation distance suffices to achieve robust clustering. Our main tools are a new identifiability criterion based on isotropic position and the Fisher discriminant, and a corresponding Sum-of-Squares convex programming relaxation, of fixed degree.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes