Sample-Efficient Private Learning of Mixtures of Gaussians
This work addresses the challenge of private data analysis for machine learning practitioners, offering incremental improvements in sample efficiency for a specific domain.
The paper tackles the problem of learning mixtures of Gaussians with differential privacy, achieving a sample complexity of roughly kd^2 + k^{1.5} d^{1.75} + k^2 d, which improves over prior results and is provably optimal in certain regimes, with linear sample complexity for univariate mixtures.
We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$. Moreover, we give the first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components $k$, whereas the previous best sample complexity [AAL21] was quadratic in $k$. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.