LG CR ITOct 24, 2022

Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano

Chuan Guo, Alexandre Sablayrolles, Maziar Sanjabi

arXiv:2210.13662v215.120 citationsh-index: 27

Originality Incremental advance

AI Analysis

This provides theoretical evidence for the effectiveness of differential privacy against data reconstruction attacks, addressing a practical concern for privacy in machine learning applications.

The paper tackles the problem of determining how small the differential privacy parameter ε needs to be to protect against data reconstruction attacks on discrete data, showing that ε can be O(log M) before an adversary gains significant inferential power, where M is the size of the data value set.

Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $ε$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $ε$ can be $O(\log M)$ before the adversary gains significant inferential power. Our analysis offers theoretical evidence for the empirical effectiveness of DP against data reconstruction attacks even at relatively large values of $ε$.

View on arXiv PDF

Similar