LGCLCRSep 13, 2020

Differentially Private Language Models Benefit from Public Pre-training

arXiv:2009.05886v21013 citations
AI Analysis

This work addresses privacy concerns in natural language processing for applications handling sensitive data, though it appears incremental as it builds on existing public pre-training and DP methods.

The authors tackled the problem of training high-quality language models on sensitive data while preserving privacy, finding that differentially private fine-tuning on a private corpus improves model performance in the private domain.

Language modeling is a keystone task in natural language processing. When training a language model on sensitive information, differential privacy (DP) allows us to quantify the degree to which our private data is protected. However, training algorithms which enforce differential privacy often lead to degradation in model quality. We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving by tuning a public base model on a private corpus. We find that DP fine-tuning boosts the performance of language models in the private domain, making the training of such models possible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes