CR LGMar 4, 2021

Quantifying identifiability to choose and audit $ε$ in differentially private deep learning

Daniel Bernau, Günther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum

arXiv:2103.02913v33.8Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of choosing and auditing privacy parameters for data scientists and practitioners in machine learning, offering a more interpretable and empirically grounded approach, though it is incremental by building on existing differential privacy and identifiability frameworks.

The paper tackles the challenge of selecting meaningful privacy parameters (ε,δ) in differentially private deep learning by transforming these parameters into a bound on an adversary's Bayesian posterior belief about record presence in the training dataset, showing it can be tight in practice and relating it to membership inference adversaries.

Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters $(ε,δ)$. Choosing meaningful privacy parameters is key, since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the theoretical upper bound on privacy loss $(ε,δ)$ might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which $(ε,δ)$ are only indirectly related. We transform $(ε,δ)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(ε,δ)$.

View on arXiv PDF Code

Similar