Decentralized Markov Chain Gradient Descent
This work addresses decentralized optimization for machine learning in scenarios with limited data access, though it appears incremental as it adapts existing decentralized stochastic gradient methods to Markov chain sampling.
The paper tackles the problem of large-scale machine learning when independent samples are costly or impossible, by proposing Decentralized Markov Chain Gradient Descent (DMGD), which uses samples from a Markov chain and achieves nonergodic and ergodic convergence with dependencies on network topology and mixing time, validated by numerical tests showing sample efficiency.
Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized stochastic gradient methods where the random samples are taken along the trajectory of a Markov chain. This setting is well-motivated when obtaining independent samples is costly or impossible, which excludes the use of the traditional stochastic gradient algorithms. Specifically, we consider the first- and zeroth-order versions of decentralized Markov chain gradient descent over a connected network, where each node only communicates with its neighbors about intermediate results. The nonergodic convergence and the ergodic convergence rate of the proposed algorithms have been rigorously established, and their critical dependences on the network topology and the mixing time of Markov chain have been highlighted. The numerical tests further validate the sample efficiency of our algorithm.