From Closed-world Enforcement to Open-world Assessment of Privacy
This addresses privacy risks for users in online communities, but is incremental as it builds on existing privacy formalizations.
The paper tackles the problem of assessing personal information exposure in open settings by developing a user-centric privacy framework, and validates it on 15 million Reddit comments to measure entity linkability.
In this paper, we develop a user-centric privacy framework for quantitatively assessing the exposure of personal information in open settings. Our formalization addresses key-challenges posed by such open settings, such as the unstructured dissemination of heterogeneous information and the necessity of user- and context-dependent privacy requirements. We propose a new definition of information sensitivity derived from our formalization of privacy requirements, and, as a sanity check, show that hard non-disclosure guarantees are impossible to achieve in open settings. After that, we provide an instantiation of our framework to address the identity disclosure problem, leading to the novel notion of d-convergence. d-convergence is based on indistinguishability of entities and it bounds the likelihood with which an adversary successfully links two profiles of the same user across online communities. Finally, we provide a large-scale evaluation of our framework on a collection of 15 million comments collected from the Online Social Network Reddit. Our evaluation validates the notion of d-convergence for assessing the linkability of entities in our data set and provides deeper insights into the data set's structure.