DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models
This work addresses the problem of training large models efficiently and robustly under differential privacy in federated settings, which is crucial for privacy-sensitive applications, though it appears incremental as it builds upon existing AdamW optimizers.
The paper tackles the challenge of balancing convergence efficiency and robustness in Differentially Private Federated Learning (DPFL) by proposing DP-FedAdamW, an optimizer that addresses issues like variance amplification and bias in second-moment estimators under DP, resulting in a 5.83% performance improvement over state-of-the-art methods on Tiny-ImageNet with Swin-Base at ε=1.
Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.