CLOct 10, 2025

Stronger Re-identification Attacks through Reasoning and Aggregation

arXiv:2510.09184v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of evaluating privacy protection in text de-identification for data security applications, representing an incremental improvement over prior attack methods.

The paper tackles the problem of assessing the robustness of text de-identification methods by developing stronger re-identification attacks, showing that aggregating predictions across multiple orderings and using reasoning models improve performance, with concrete gains such as increased accuracy in uncovering masked personally identifiable information.

Text de-identification techniques are often used to mask personally identifiable information (PII) from documents. Their ability to conceal the identity of the individuals mentioned in a text is, however, hard to measure. Recent work has shown how the robustness of de-identification methods could be assessed by attempting the reverse process of _re-identification_, based on an automated adversary using its background knowledge to uncover the PIIs that have been masked. This paper presents two complementary strategies to build stronger re-identification attacks. We first show that (1) the _order_ in which the PII spans are re-identified matters, and that aggregating predictions across multiple orderings leads to improved results. We also find that (2) reasoning models can boost the re-identification performance, especially when the adversary is assumed to have access to extensive background knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes