Subset Privacy: Draw from an Obfuscated Urn
This addresses privacy concerns for individuals whose categorical data is collected by untrusted entities, representing an incremental advancement in local privacy mechanisms.
The authors tackled the problem of protecting categorical data collected by untrusted entities by proposing subset privacy, a new local privacy notion that replaces original values with random subsets containing them. They developed methods for distribution estimation and independence testing with theoretical guarantees, showing encouraging experimental results on simulated and real-world datasets.
With the rapidly increasing ability to collect and analyze personal data, data privacy becomes an emerging concern. In this work, we develop a new statistical notion of local privacy to protect each categorical data that will be collected by untrusted entities. The proposed solution, named subset privacy, privatizes the original data value by replacing it with a random subset containing that value. We develop methods for the estimation of distribution functions and independence testing from subset-private data with theoretical guarantees. We also study different mechanisms to realize the subset privacy and evaluation metrics to quantify the amount of privacy in practice. Experimental results on both simulated and real-world datasets demonstrate the encouraging performance of the developed concepts and methods.