CL LGJun 2, 2023

Driving Context into Text-to-Text Privatization

Stefan Arnold, Dilara Yesilbas, Sven Weinzierl

arXiv:2306.01457v126.7230 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses privacy issues in text processing for users needing to protect sensitive information, but it is incremental as it builds on existing metric differential privacy methods.

The paper tackled the problem of ambiguous words in text-to-text privatization by incorporating sense disambiguation before noise injection, resulting in a 6.05% increase in classification accuracy on the Words in Context dataset.

\textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as \textit{'bank'}. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the \textit{Words in Context} dataset, we demonstrate a substantial increase in classification accuracy by $6.05\%$.

View on arXiv PDF

Similar