MLCRDSITLGSep 7, 2023

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

arXiv:2309.03847v37 citationsh-index: 13
Originality Highly original
AI Analysis

This provides a foundational advance in private machine learning by enabling efficient learning of complex distributions, which is crucial for applications in data privacy and security.

The paper tackles the problem of privately learning mixtures of Gaussians under differential privacy constraints, showing that a polynomial number of samples is sufficient to estimate such mixtures up to a specified total variation distance without structural assumptions.

We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/α,1/\varepsilon,\log(1/δ))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $α$ while satisfying $(\varepsilon, δ)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes