Revisiting EXTRA for Smooth Distributed Optimization
It offers incremental improvements in communication and computation complexities for distributed optimization, relevant for large-scale machine learning and data processing systems.
This paper revisits the EXTRA method for decentralized distributed optimization, providing improved complexity bounds for both strongly convex and non-strongly convex problems, and accelerates it using the Catalyst framework to achieve complexities close to lower bounds, with factors like O((L/μ + 1/(1-σ₂(W))) log(1/(ε(1-σ₂(W)))).
EXTRA is a popular method for dencentralized distributed optimization and has broad applications. This paper revisits EXTRA. First, we give a sharp complexity analysis for EXTRA with the improved $O\left(\left(\frac{L}μ+\frac{1}{1-σ_2(W)}\right)\log\frac{1}{ε(1-σ_2(W))}\right)$ communication and computation complexities for $μ$-strongly convex and $L$-smooth problems, where $σ_2(W)$ is the second largest singular value of the weight matrix $W$. When the strong convexity is absent, we prove the $O\left(\left(\frac{L}ε+\frac{1}{1-σ_2(W)}\right)\log\frac{1}{1-σ_2(W)}\right)$ complexities. Then, we use the Catalyst framework to accelerate EXTRA and obtain the $O\left(\sqrt{\frac{L}{μ(1-σ_2(W))}}\log\frac{ L}{μ(1-σ_2(W))}\log\frac{1}ε\right)$ communication and computation complexities for strongly convex and smooth problems and the $O\left(\sqrt{\frac{L}{ε(1-σ_2(W))}}\log\frac{1}{ε(1-σ_2(W))}\right)$ complexities for non-strongly convex ones. Our communication complexities of the accelerated EXTRA are only worse by the factors of $\left(\log\frac{L}{μ(1-σ_2(W))}\right)$ and $\left(\log\frac{1}{ε(1-σ_2(W))}\right)$ from the lower complexity bounds for strongly convex and non-strongly convex problems, respectively.