CLMay 1

A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

arXiv:2605.0106592.7h-index: 8
Predicted impact top 20% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers and practitioners in privacy-preserving NLP, this work highlights the importance of careful design choices in DP text obfuscation, though it is incremental as it systematically evaluates existing methods rather than proposing novel ones.

This paper systematically evaluates text decomposition and budget distribution techniques for differentially private text obfuscation, showing that design choices significantly impact utility even with comparable privacy budgets, and provides evidence for optimizing empirical trade-offs.

The goal of differentially private text obfuscation is to obfuscate, or "perturb", input texts with Differential Privacy (DP) guarantees, such that the private output texts are quantifiably indistinguishable from the originals. While perturbation at the word level is intuitive, meaningful text privatization happens on complete documents. Recent research has laid the groundwork for reasoning about privacy budget distribution, namely, how an overall $\varepsilon$ budget can be sensibly distributed among the component pieces of a text. We perform a systematic evaluation of multiple text decomposition and budget distribution techniques in the context of DP text obfuscation, testing how different methods for chunking texts can be combined with techniques for allocating $\varepsilon$ to these chunks. Our experiments reveal that such design choices are very important, as even with comparable privacy budgets, significantly different results can occur based on which methods are chosen. In this, we provide credible evidence of the feasibility of maximizing empirical trade-offs by optimizing DP obfuscation procedures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes