LG CRJun 4, 2024

Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Thakurta

arXiv:2406.02716v22.6

Originality Highly original

AI Analysis

This work addresses efficiency in privacy-preserving optimization, particularly for applications like federated learning, by significantly reducing computational overhead compared to prior methods that required Ω(n) steps.

The paper tackles the problem of reducing batch gradient complexity in differentially private stochastic convex optimization (DP-SCO) for smooth convex losses, achieving the optimal rate with only √n batch gradient steps in a single epoch, and further improving to n^{1/4} steps under certain conditions.

In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/δ)}}{εn}\right)$ under $(ε, δ)$-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $Ω(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

View on arXiv PDF

Similar