Yunyan Bai

OCOct 25, 2022

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

Luo Luo, Yunyan Bai, Lesi Chen et al.

We study the decentralized optimization problem $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{m}\sum_{i=1}^m f_i({\bf x})$, where the local function on the $i$-th agent has the form of $f_i({\bf x})\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}({\bf x})$ and every individual $f_{i,j}$ is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an $ε$-stationary point at each agent with the communication rounds of $\tilde{\mathcal O}(Lε^{-2}/\sqrtγ\,)$, the computation rounds of $\tilde{\mathcal O}(n+(L+\min\{nL, \sqrt{n/m}\bar L\})ε^{-2})$, and the local incremental first-oracle calls of ${\mathcal O}(mn + {\min\{mnL, \sqrt{mn}\bar L\}}{ε^{-2}})$, where $L$ is the smoothness parameter of the objective function, $\bar L$ is the mean-squared smoothness parameter of all individual functions, and $γ$ is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters $L$ and $\bar L$ used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-Łojasiewicz condition, also achieving the near-optimal complexity bounds.

OCFeb 4, 2024

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Yunyan Bai, Yuxing Liu, Luo Luo

This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak--Łojasiewicz (PL) condition with parameter $μ$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $Ω(n+κ\sqrt{n}\log(1/ε))$ incremental first-order oracle (IFO) calls to find an $ε$-suboptimal solution, where $κ\triangleq L/μ$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),\dots,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $Ω(κ/\sqrtγ\,\log(1/ε))$, $Ω((κ+τκ/\sqrtγ\,)\log(1/ε))$ and $Ω\big(n+κ\sqrt{n}\log(1/ε)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $γ\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and~$τ>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

Yunyan Bai

2 Papers