Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation
This work addresses privacy concerns in mobile keyboard applications, though it appears incremental as it adapts existing methods to a specific use case.
The authors tackled the problem of training privacy-preserving language models for mobile keyboards by implementing differential privacy in a transformer architecture, achieving small but consistent gains in next-word prediction accuracy with manageable increases in memory and speed compared to existing GRU models.
In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.