Within-Dataset Disclosure Risk for Differential Privacy
For data controllers deploying differential privacy, this work provides a practical tool to interpret and set the privacy parameter based on disclosure risk, though the approach is incremental.
The paper addresses the challenge of choosing the privacy parameter ε in differential privacy by introducing a relative disclosure risk indicator (RDR) that helps controllers understand the impact on within-dataset individuals. It proposes algorithms to find ε based on controllers' preferences and to bound total privacy leakage for multiple queries, validated through a user study and experiments.
Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $ε$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $ε$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $ε$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $ε$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $ε$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $ε$, and experimental evaluations showing our algorithms are efficient and scalable.