Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation
This work addresses utility degradation in privacy-preserving data analysis for applications like distribution estimation, offering a practical improvement over standard LDP.
The paper tackles the problem of excessive utility loss in Local Differential Privacy (LDP) by introducing Utility-optimized LDP (ULDP), which applies privacy guarantees only to sensitive data, and shows that their mechanisms achieve much higher utility than existing LDP methods, with near-non-private utility when most data is non-sensitive.
LDP (Local Differential Privacy) has been widely studied to estimate statistics of personal data (e.g., distribution underlying the data) while protecting users' privacy. Although LDP does not require a trusted third party, it regards all personal data equally sensitive, which causes excessive obfuscation hence the loss of utility. In this paper, we introduce the notion of ULDP (Utility-optimized LDP), which provides a privacy guarantee equivalent to LDP only for sensitive data. We first consider the setting where all users use the same obfuscation mechanism, and propose two mechanisms providing ULDP: utility-optimized randomized response and utility-optimized RAPPOR. We then consider the setting where the distinction between sensitive and non-sensitive data can be different from user to user. For this setting, we propose a personalized ULDP mechanism with semantic tags to estimate the distribution of personal data with high utility while keeping secret what is sensitive for each user. We show theoretically and experimentally that our mechanisms provide much higher utility than the existing LDP mechanisms when there are a lot of non-sensitive data. We also show that when most of the data are non-sensitive, our mechanisms even provide almost the same utility as non-private mechanisms in the low privacy regime.