CL CR DS LG MLOct 12, 2020

TextHide: Tackling Data Privacy in Language Understanding Tasks

Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora

arXiv:2010.06053v131.41014 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses privacy risks for participants in federated learning systems, though it is incremental as it builds on existing fine-tuning frameworks.

The paper tackles data privacy in distributed or federated learning for natural language understanding by proposing TextHide, an encryption method that prevents eavesdropping attackers from recovering private text data, resulting in only a 1.9% average accuracy reduction on the GLUE benchmark.

An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only $1.9\%$. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem. Our code is available at https://github.com/Hazelsuko07/TextHide

View on arXiv PDF Code

Similar