LGCRMLAug 11, 2023

Private Distribution Learning with Public Data: The View from Sample Compression

arXiv:2308.06239v218 citationsh-index: 37
AI Analysis

This addresses privacy-preserving machine learning for distribution estimation, offering incremental theoretical advances in sample complexity and learnability.

The paper tackles the problem of private distribution learning with access to public data, showing that learnability is connected to sample compression schemes and list learning, leading to new results such as sample complexity bounds for Gaussian mixtures and closure properties, with a lower bound of at least d public samples for Gaussians in R^d.

We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class $\mathcal Q$ is connected to the existence of a sample compression scheme for $\mathcal Q$, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over $\mathbb R^d$; and (2) leads to new ones, including sample complexity upper bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for private learnability, which is close to the known upper bound of $d+1$ public samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes