CLCRLGApr 26, 2022

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Meta AI
arXiv:2205.10228v1633 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users of social chatbots by preventing persona leakage from hidden states, representing an incremental improvement in privacy protection methods.

The paper tackles the problem of privacy leakage in social chatbots, where hidden states can reveal speakers' personas, and shows that a simple neural network can infer personas with high accuracy. It proposes defense objectives that reduce attack accuracy from 37.6% to 0.5% while preserving generation ability.

Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve language models' powerful generation ability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes