DSCCITLGSTNov 20, 2017

List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

arXiv:1711.07211v1158 citations
Originality Incremental advance
AI Analysis

This work addresses robust statistical estimation problems for machine learning and data analysis, offering incremental improvements in algorithmic guarantees.

The paper tackles list-decodable Gaussian mean estimation and learning mixtures of spherical Gaussians, achieving improved error bounds of O(α^{-1/(2d)}) and requiring weaker separation assumptions of Ω(k^ε + √log(1/δ)), respectively, with polynomial-time algorithms.

We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. {\bf List-Decodable Mean Estimation.} Fix any $d \in \mathbb{Z}_+$ and $0< α<1/2$. We design an algorithm with runtime $O (\mathrm{poly}(n/α)^{d})$ that outputs a list of $O(1/α)$ many candidate vectors such that with high probability one of the candidates is within $\ell_2$-distance $O(α^{-1/(2d)})$ from the true mean. The only previous algorithm for this problem achieved error $\tilde O(α^{-1/2})$ under second moment conditions. For $d = O(1/ε)$, our algorithm runs in polynomial time and achieves error $O(α^ε)$. We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. {\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of $k$ identity covariance Gaussians we obtain: For any $ε>0$, if the pairwise separation between the means is at least $Ω(k^ε+\sqrt{\log(1/δ)})$, our algorithm learns the unknown parameters within accuracy $δ$ with sample complexity and running time $\mathrm{poly} (n, 1/δ, (k/ε)^{1/ε})$. The previously best known polynomial time algorithm required separation at least $k^{1/4} \mathrm{polylog}(k/δ)$. Our main technical contribution is a new technique, using degree-$d$ multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes