Decentralized Learning via Random Walk with Jumps
This paper addresses the problem of slow convergence in decentralized learning due to entrapment, offering a practical solution for distributed systems with data heterogeneity.
Decentralized learning over networks suffers from 'entrapment' when using weighted random walks, causing correlated updates and slow convergence. The authors propose Metropolis-Hastings with Lévy jumps (MHLJ), which introduces occasional long-range transitions, and show it eliminates entrapment and significantly speeds up learning.
We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagated across the network and updated at each visited node using local data, thereby incurring low communication and computational overheads. In weighted random-walk learning, the transition matrix is designed to achieve a desired sampling distribution, thereby speeding up convergence under data heterogeneity. We show that implementing weighted sampling via the Metropolis-Hastings algorithm can lead to a previously unexplored phenomenon we term entrapment. The random walk may become trapped in a small region of the network, resulting in highly correlated updates and severely degraded convergence. To address this issue, we propose Metropolis-Hastings with Levy jumps, which introduces occasional long-range transitions to restore exploration while respecting local information constraints. We establish a convergence rate that explicitly characterizes the roles of data heterogeneity, network spectral gap, and jump probability, and demonstrate through experiments that MHLJ effectively eliminates entrapment and significantly speeds up decentralized learning.