Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
This work is significant for researchers and practitioners in decentralized deep learning, as it tackles the critical problem of performance degradation due to data heterogeneity.
This paper addresses the challenge of data heterogeneity in decentralized deep learning by proposing a novel momentum-based method. Their approach significantly improves test performance by 1%-20% on various CV/NLP datasets and network topologies compared to existing methods.
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge and may severely deteriorate the generalization performance. In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity. We propose a novel momentum-based method to mitigate this decentralized training difficulty. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10, ImageNet, and AG News) and several network topologies (Ring and Social Network) that our method is much more robust to the heterogeneity of clients' data than other existing methods, by a significant improvement in test performance ($1\% \!-\! 20\%$). Our code is publicly available.