AIDec 24, 2015

Measuring pattern retention in anonymized data -- where one measure is not enough

arXiv:1512.07721v12.9

Originality Incremental advance

AI Analysis

This work addresses the challenge of evaluating data quality in privacy-preserving data analysis for researchers and practitioners, offering incremental improvements by introducing new measures to complement existing ones.

The paper tackles the problem of ensuring that anonymized data retains the original patterns after modification for privacy, demonstrating that prediction accuracy alone is insufficient and proposing a new methodology with three complementary measures to assess pattern retention.

In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures -- they are complementary to them, and support our argument that one measure is almost never enough.

View on arXiv PDF

Similar