CRNov 25, 2016

On the Evaluation of the Privacy Breach in Disassociated Set-Valued Datasets

arXiv:1611.08417v16 citations
Originality Synthesis-oriented
AI Analysis

This work identifies a specific vulnerability in data anonymization methods, which is incremental but important for practitioners in privacy-preserving data publishing.

The paper demonstrates that disassociation, a bucketization technique for anonymizing set-valued datasets, can lead to privacy breaches when subjected to a cover problem, and it evaluates this breach using a detection algorithm on real datasets.

Data anonymization is gaining much attention these days as it provides the fundamental requirements to safely outsource datasets containing identifying information. While some techniques add noise to protect privacy others use generalization to hide the link between sensitive and non-sensitive information or separate the dataset into clusters to gain more utility. In the latter, often referred to as bucketization, data values are kept intact, only the link is hidden to maximize the utility. In this paper, we showcase the limits of disassociation, a bucketization technique that divides a set-valued dataset into $k^m$-anonymous clusters. We demonstrate that a privacy breach might occur if the disassociated dataset is subject to a cover problem. We finally evaluate the privacy breach using the quantitative privacy breach detection algorithm on real disassociated datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes