LGCROct 30, 2024

Calibrating Practical Privacy Risks for Differentially Private Machine Learning

arXiv:2410.22673v1h-index: 5BigData
Originality Incremental advance
AI Analysis

This work addresses practical privacy risk calibration for differentially private machine learning, offering a method to improve privacy-utility trade-offs, though it is incremental as it builds on existing ASR-based evaluation and feature-masking techniques.

The paper tackles the challenge of interpreting differential privacy's theoretical privacy budget ε in practice by showing that the likelihood-ratio-based membership inference attack success rate (ASR) varies with datasets and models, and proposes feature-masking strategies using SHAP and LIME to lower ASR without compromising utility, enabling larger ε settings for equivalent privacy protection.

Differential privacy quantifies privacy through the privacy budget $ε$, yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical $ε$ setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA $ASR^M$ on model $M$ can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical $ε$ settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed $ε$ settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes