LGCLMay 10, 2022

Sentence-level Privacy for Document Embeddings

arXiv:2205.04605v1641 citationsh-index: 44
Originality Highly original
AI Analysis

This work addresses privacy concerns for users in natural language processing applications, offering a stronger and interpretable guarantee compared to existing methods.

The paper tackles the problem of protecting sensitive personal content in user language data by proposing SentDP, a method for generating sentence-level differentially private document embeddings, which outperforms baseline methods with weaker guarantees in downstream tasks like sentiment analysis and topic classification.

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose $ε$-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding $ε$-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes