LGOct 18, 2017

Learning Differentially Private Recurrent Language Models

H. Brendan McMahan, Daniel Ramage, Kunal Talwar, Li Zhang

arXiv:1710.06963v344.1505 citations

Originality Incremental advance

AI Analysis

This work addresses privacy concerns for users in language modeling applications, though it is incremental as it builds on existing federated averaging and privacy accounting methods.

The authors tackled the problem of training large recurrent language models with user-level differential privacy, showing that it can be achieved with only a negligible cost in predictive accuracy, as the private models are quantitatively and qualitatively similar to un-noised models when trained on large datasets.

We demonstrate that it is possible to train large recurrent language models with user-level differential privacy guarantees with only a negligible cost in predictive accuracy. Our work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent. In particular, we add user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user-level data. Our work demonstrates that given a dataset with a sufficiently large number of users (a requirement easily met by even small internet-scale datasets), achieving differential privacy comes at the cost of increased computation, rather than in decreased utility as in most prior work. We find that our private LSTM language models are quantitatively and qualitatively similar to un-noised models when trained on a large dataset.

View on arXiv PDF

Similar