Learning Mixtures of Gaussians with Censored Data
This addresses a classical statistical learning problem with practical applications, providing finite-sample guarantees for a latent variable model where such guarantees were previously missing.
The paper tackles the problem of learning mixtures of Gaussians from censored data, where samples are only observed if they lie within a specific set, and proposes an algorithm that requires only 1/ε^O(k) samples to estimate the weights and means within ε error.
We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}^k w_i \mathcal{N}(μ_i,σ^2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $μ_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $μ_i$ within $\varepsilon$ error.