A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
This addresses a fundamental challenge in machine learning for statisticians and theorists, offering an agnostic algorithm that works even when the true distribution is not a Gaussian mixture, which is incremental progress on an open problem.
The paper tackles the problem of properly learning a mixture of k univariate Gaussians, achieving a nearly optimal sample complexity of Õ(k/ε²) and significantly improving the time complexity from exponential to polynomial in 1/ε, with a running time of (k·log(1/ε))^{O(k⁴)} + Õ(k/ε²).
Learning a Gaussian mixture model (GMM) is a fundamental problem in machine learning, learning theory, and statistics. One notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the density $f$ of the unknown distribution from which we draw samples. The distance between $\mathcal{M}$ and $f$ is typically measured in the total variation or $L_1$-norm. We give an algorithm for learning a mixture of $k$ univariate Gaussians that is nearly optimal for any fixed $k$. The sample complexity of our algorithm is $\tilde{O}(\frac{k}{ε^2})$ and the running time is $(k \cdot \log\frac{1}ε)^{O(k^4)} + \tilde{O}(\frac{k}{ε^2})$. It is well-known that this sample complexity is optimal (up to logarithmic factors), and it was already achieved by prior work. However, the best known time complexity for proper learning a $k$-GMM was $\tilde{O}(\frac{1}{ε^{3k-1}})$. In particular, the dependence between $\frac{1}ε$ and $k$ was exponential. We significantly improve this dependence by replacing the $\frac{1}ε$ term with a $\log \frac{1}ε$ while only increasing the exponent moderately. Hence, for any fixed $k$, the $\tilde{O} (\frac{k}{ε^2})$ term dominates our running time, and thus our algorithm runs in time which is nearly-linear in the number of samples drawn. Achieving a running time of $\textrm{poly}(k, \frac{1}ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Moreover, our approach offers an agnostic learning guarantee: our algorithm returns a good GMM even if the distribution we are sampling from is not a mixture of Gaussians. To the best of our knowledge, our algorithm is the first agnostic proper learning algorithm for GMMs.