The Sample Complexity of Membership Inference and Privacy Auditing
This addresses privacy auditing for machine learning practitioners by revealing potential underestimation in existing attacks, though it is incremental as it builds on prior work in a specific setting.
The paper tackles the problem of membership inference attacks by analyzing the sample complexity required for an attacker to succeed, specifically in Gaussian mean estimation. It shows that Ω(n + n^2 ρ^2) reference samples can be necessary, which is sometimes more than the training data size, indicating current attacks may underestimate risks.
A membership-inference attack gets the output of a learning algorithm, and a target individual, and tries to determine whether this individual is a member of the training data or an independent sample from the same distribution. A successful membership-inference attack typically requires the attacker to have some knowledge about the distribution that the training data was sampled from, and this knowledge is often captured through a set of independent reference samples from that distribution. In this work we study how much information the attacker needs for membership inference by investigating the sample complexity-the minimum number of reference samples required-for a successful attack. We study this question in the fundamental setting of Gaussian mean estimation where the learning algorithm is given $n$ samples from a Gaussian distribution $\mathcal{N}(μ,Σ)$ in $d$ dimensions, and tries to estimate $\hatμ$ up to some error $\mathbb{E}[\|\hat μ- μ\|^2_Σ]\leq ρ^2 d$. Our result shows that for membership inference in this setting, $Ω(n + n^2 ρ^2)$ samples can be necessary to carry out any attack that competes with a fully informed attacker. Our result is the first to show that the attacker sometimes needs many more samples than the training algorithm uses to train the model. This result has significant implications for practice, as all attacks used in practice have a restricted form that uses $O(n)$ samples and cannot benefit from $ω(n)$ samples. Thus, these attacks may be underestimating the possibility of membership inference, and better attacks may be possible when information about the distribution is easy to obtain.