CLSep 21, 2022

Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers

Ruisi Zhang, Seira Hidano, Farinaz Koushanfar

arXiv:2209.10505v15.240 citationsh-index: 68

Originality Highly original

AI Analysis

This addresses privacy risks for users of text classification systems, such as in sentiment analysis, by revealing vulnerabilities in widely used transformer models, representing a novel security concern rather than an incremental improvement.

The paper tackles the problem of private information leakage from transformer-based text classification models by introducing Text Revealer, a model inversion attack that reconstructs private training texts with access to the target model, achieving effective reconstruction across datasets with varying text lengths.

Text classification has become widely used in various natural language processing applications like sentiment analysis. Current applications often use large transformer-based language models to classify input texts. However, there is a lack of systematic study on how much private information can be inverted when publishing models. In this paper, we formulate \emph{Text Revealer} -- the first model inversion attack for text reconstruction against text classification with transformers. Our attacks faithfully reconstruct private texts included in training data with access to the target model. We leverage an external dataset and GPT-2 to generate the target domain-like fluent text, and then perturb its hidden state optimally with the feedback from the target model. Our extensive experiments demonstrate that our attacks are effective for datasets with different text lengths and can reconstruct private texts with accuracy.

View on arXiv PDF

Similar