CRAICLLGJul 3, 2024

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

arXiv:2407.02956v222 citationsh-index: 9
Originality Highly original
AI Analysis

This work addresses privacy protection in text data for users and applications, presenting a novel method with strong empirical gains.

The authors tackled the problem of text anonymization to prevent inference of private attributes while preserving utility, achieving over 90% reduction in private attribute leakage across 8 attributes and demonstrating real-world applicability with on-device models that halve privacy leakage with minimal utility impact.

In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. Our results show the possibility of reducing privacy leakage by more than half with limited impact on utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes