LGOct 18, 2017

Learning Differentially Private Recurrent Language Models

arXiv:1710.06963v3505 citations
Originality Incremental advance
AI Analysis

This work addresses privacy concerns for users in language modeling applications, though it is incremental as it builds on existing federated averaging and privacy accounting methods.

The authors tackled the problem of training large recurrent language models with user-level differential privacy, showing that it can be achieved with only a negligible cost in predictive accuracy, as the private models are quantitatively and qualitatively similar to un-noised models when trained on large datasets.

We demonstrate that it is possible to train large recurrent language models with user-level differential privacy guarantees with only a negligible cost in predictive accuracy. Our work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent. In particular, we add user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user-level data. Our work demonstrates that given a dataset with a sufficiently large number of users (a requirement easily met by even small internet-scale datasets), achieving differential privacy comes at the cost of increased computation, rather than in decreased utility as in most prior work. We find that our private LSTM language models are quantitatively and qualitatively similar to un-noised models when trained on a large dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes