CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence
This work addresses communication efficiency in decentralized machine learning, offering incremental improvements in convergence speed for distributed optimization problems.
The paper tackles distributed optimization in communication-restricted multi-agent networks by proposing CEDAS, a compressed decentralized stochastic gradient method, which achieves convergence rates comparable to centralized SGD for smooth strongly convex and nonconvex functions, with transient times of O(nC^3/(1-λ_2)^2) and O(n^3C^6/(1-λ_2)^4) respectively.
In this paper, we consider solving the distributed optimization problem over a multi-agent network under the communication restricted setting. We study a compressed decentralized stochastic gradient method, termed ``compressed exact diffusion with adaptive stepsizes (CEDAS)", and show the method asymptotically achieves comparable convergence rate as centralized { stochastic gradient descent (SGD)} for both smooth strongly convex objective functions and smooth nonconvex objective functions under unbiased compression operators. In particular, to our knowledge, CEDAS enjoys so far the shortest transient time (with respect to the graph specifics) for achieving the convergence rate of centralized SGD, which behaves as $\mathcal{O}(n{C^3}/(1-λ_2)^{2})$ under smooth strongly convex objective functions, and $\mathcal{O}(n^3{C^6}/(1-λ_2)^4)$ under smooth nonconvex objective functions, where $(1-λ_2)$ denotes the spectral gap of the mixing matrix, and $C>0$ is the compression-related parameter. In particular, CEDAS exhibits the shortest transient times when $C < \mathcal{O}(1/(1 - λ_2)^2)$, which is common in practice. Numerical experiments further demonstrate the effectiveness of the proposed algorithm.