CLApr 15, 2021

Natural Language Understanding with Privacy-Preserving BERT

Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, Marc Najork

arXiv:2104.07504v27.291 citations

Originality Incremental advance

AI Analysis

This addresses privacy concerns for users of pretrained language models in data mining and NLU applications, though it is incremental as it builds on existing privacy techniques.

The paper tackles the problem of privacy leakage in Natural Language Understanding by applying dx-privacy to BERT fine-tuning, resulting in a method that boosts utility dramatically while maintaining privacy protection, with quantified privacy levels and configuration guidance.

Privacy preservation remains a key challenge in data mining and Natural Language Understanding (NLU). Previous research shows that the input text or even text embeddings can leak private information. This concern motivates our research on effective privacy preservation approaches for pretrained Language Models (LMs). We investigate the privacy and utility implications of applying dx-privacy, a variant of Local Differential Privacy, to BERT fine-tuning in NLU applications. More importantly, we further propose privacy-adaptive LM pretraining methods and show that our approach can boost the utility of BERT dramatically while retaining the same level of privacy protection. We also quantify the level of privacy preservation and provide guidance on privacy configuration. Our experiments and findings lay the groundwork for future explorations of privacy-preserving NLU with pretrained LMs.

View on arXiv PDF

Similar