CLNov 30, 2025

How do we measure privacy in text? A survey of text anonymization metrics

arXiv:2512.01109v12 citationsh-index: 14IJCNLP-AACL
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of measuring privacy in text for researchers and practitioners in NLP, offering a survey that clarifies metrics but is incremental as it synthesizes existing literature without introducing new methods.

The authors tackled the problem of evaluating privacy protection in text anonymization by conducting a systematic survey of 47 papers, identifying and comparing six distinct privacy notions and analyzing their alignment with legal standards and user expectations to provide practical guidance for more robust evaluations.

In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes