LGSTOct 14, 2017

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

arXiv:1710.05209v58 citations
Originality Highly original
AI Analysis

This provides fundamental sample complexity bounds for a core problem in machine learning, with implications for statistical learning theory and applications like clustering and density estimation.

The paper tackles the problem of learning mixtures of Gaussians with near-optimal sample complexity, proving that $ ilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for general mixtures and $ ilde{O}(k d / \varepsilon^2)$ for axis-aligned mixtures, with results extending to robust settings.

We prove that $\tildeΘ(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes