OC CR DS LG MLJan 1, 2023

ReSQueing Parallel and Private Stochastic Convex Optimization

Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

arXiv:2301.00457v215.920 citationsh-index: 48

Originality Incremental advance

AI Analysis

This work addresses efficiency bottlenecks in machine learning optimization for large-scale and privacy-sensitive applications, representing an incremental advance over prior methods.

The paper tackles stochastic convex optimization in parallel and private settings by introducing the ReSQue estimator, achieving state-of-the-art complexities such as optimization error ε_opt with d^{1/3}ε_opt^{-2/3} gradient oracle query depth and near-linear gradient query complexity for differential privacy when n ≳ d^2 ε_dp^{-3}.

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $ε_{\text{opt}}$ with $d^{1/3}ε_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}ε_{\text{opt}}^{-2/3} + ε_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $ε_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d ε_{\text{dp}}^{-2}$, $(ε_{\text{dp}}, δ)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 ε_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime.

View on arXiv PDF

Similar