Privacy Amplification by Missing Data
This work addresses privacy concerns in high-stakes domains like medicine and finance, offering a novel perspective on missing data as beneficial rather than limiting.
The paper tackles the problem of privacy preservation in datasets with missing values by analyzing missing data as a privacy amplification mechanism within differential privacy, showing that incomplete data can enhance privacy for differentially private algorithms.
Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.