CLMar 9

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

arXiv:2603.08207v194.7
Predicted impact top 5% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This paper highlights a critical methodological flaw in the evaluation of PII removal techniques, impacting the trustworthiness of privacy research for data protection practitioners and regulators.

This paper investigates the reported success of reconstruction attacks on PII removal techniques, finding that existing evaluations suffer from data leakage and contamination. The authors conclude that only truly private data can objectively assess these vulnerabilities, but such data is inaccessible to the public research community.

Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted - and for good reasons - which also means that the public research community cannot address this problem in a transparent, reproducible, and trustworthy manner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes