Differentially Private Learning Needs Better Model Initialization and Self-Distillation
This addresses the challenge of deploying privacy-preserving language models without sacrificing performance, though it is incremental as it builds on existing DPSGD techniques.
The paper tackles the problem of utility, diversity, and linguistic quality degradation in differentially private language model training by introducing DPRefine, a method that improves over vanilla DPSGD with AlpacaEval preferring its generations in 78.4% of cases and reducing linguistic errors by 84.0%.
Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.