D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems
This work addresses decentralized optimization for large-scale machine learning, offering a state-of-the-art method with incremental improvements in convergence rate for nonconvex problems.
The paper tackles the problem of developing a decentralized optimization algorithm for nonconvex problems, proposing D-SPIDER-SFO, which achieves a gradient computation cost of O(ε^{-3}) for finding an ε-approximate first-order stationary point, matching its centralized counterpart.
Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost---that is, $\mathcal{O}(ε^{-3})$ for finding an $ε$-approximate first-order stationary point---to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method.