Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods
This work provides stronger theoretical foundations for differentially private optimization, addressing a gap in existing analyses for researchers and practitioners in privacy-preserving machine learning, though it is incremental as it builds on standard convergence theory.
The paper tackled the problem of understanding the long-run behavior of differentially private stochastic gradient descent (DP-SGD) by proving that it converges almost surely under standard smoothness assumptions, both in nonconvex and strongly convex settings, with step sizes satisfying decaying conditions, and extended this analysis to momentum variants like DP-SHB and DP-NAG.
Differentially private stochastic gradient descent (DP-SGD) has become the standard algorithm for training machine learning models with rigorous privacy guarantees. Despite its widespread use, the theoretical understanding of its long-run behavior remains limited: existing analyses typically establish convergence in expectation or with high probability, but do not address the almost sure convergence of single trajectories. In this work, we prove that DP-SGD converges almost surely under standard smoothness assumptions, both in nonconvex and strongly convex settings, provided the step sizes satisfy some standard decaying conditions. Our analysis extends to momentum variants such as the stochastic heavy ball (DP-SHB) and Nesterov's accelerated gradient (DP-NAG), where we show that careful energy constructions yield similar guarantees. These results provide stronger theoretical foundations for differentially private optimization and suggest that, despite privacy-induced distortions, the algorithm remains pathwise stable in both convex and nonconvex regimes.