CLNov 30, 2025

How do we measure privacy in text? A survey of text anonymization metrics

Yaxuan Ren, Krithika Ramesh, Yaxing Yao, Anjalie Field

arXiv:2512.01109v16.72 citationsh-index: 14IJCNLP-AACL

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of measuring privacy in text for researchers and practitioners in NLP, offering a survey that clarifies metrics but is incremental as it synthesizes existing literature without introducing new methods.

The authors tackled the problem of evaluating privacy protection in text anonymization by conducting a systematic survey of 47 papers, identifying and comparing six distinct privacy notions and analyzing their alignment with legal standards and user expectations to provide practical guidance for more robust evaluations.

In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.

View on arXiv PDF

Similar