Privately Learning Mixtures of Axis-Aligned Gaussians
This addresses the challenge of privately learning complex distributions like Gaussian mixtures, which is crucial for data analysis in sensitive domains, representing a foundational advance rather than an incremental step.
The paper tackles the problem of learning mixtures of axis-aligned Gaussians under differential privacy constraints, proving sample complexity bounds for achieving total variation distance with privacy guarantees, and introduces a new technique based on private list-decodability to overcome limitations of prior methods.
We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that $\widetilde{O}(k^2 d \log^{3/2}(1/δ) / α^2 \varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ to within total variation distance $α$ while satisfying $(\varepsilon, δ)$-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that $\widetilde{O}(kd/α^2 + kd \log(1/δ) / α\varepsilon)$ samples are sufficient. Recently, the "local covering" technique of Bun, Kamath, Steinke, and Wu has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions $\mathcal{F}$ is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from $f\in \mathcal{F}$, outputs a list of distributions, $\widehat{\mathcal{F}}$, such that one of the distributions in $\widehat{\mathcal{F}}$ approximates $f$. We show that if $\mathcal{F}$ is privately list-decodable, then we can privately learn mixtures of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.