CLLGJun 2, 2023

Driving Context into Text-to-Text Privatization

arXiv:2306.01457v1230 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses privacy issues in text processing for users needing to protect sensitive information, but it is incremental as it builds on existing metric differential privacy methods.

The paper tackled the problem of ambiguous words in text-to-text privatization by incorporating sense disambiguation before noise injection, resulting in a 6.05% increase in classification accuracy on the Words in Context dataset.

\textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as \textit{'bank'}. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the \textit{Words in Context} dataset, we demonstrate a substantial increase in classification accuracy by $6.05\%$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes