CLJan 31, 2025

On the Impact of Noise in Differentially Private Text Rewriting

Stephen Meisenbacher, Maulik Chevli, Florian Matthes

arXiv:2501.19022v120.416 citationsh-index: 7Has CodeNAACL

Originality Incremental advance

AI Analysis

This work addresses the trade-off between privacy and utility in NLP for applications requiring text anonymization, though it is incremental in exploring noise effects.

The paper tackled the problem of utility loss in differentially private text rewriting by introducing a new sentence infilling privatization technique and comparing it to non-DP methods, finding that non-DP techniques preserve utility better but DP methods offer stronger privacy protections.

The field of text privatization often leverages the notion of $\textit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter $\varepsilon$. However, noise addition almost undoubtedly leads to considerable utility loss, thereby highlighting one major drawback of DP in NLP. In this work, we introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting. We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections. Our results highlight the significant impact of noise in current DP rewriting mechanisms, leading to a discussion of the merits and challenges of DP in NLP, as well as the opportunities that non-DP methods present.

View on arXiv PDF Code

Similar