Communication-Efficient Distributionally Robust Decentralized Learning
This work addresses the challenge of ensuring robust performance across devices in decentralized learning systems, particularly for applications with heterogeneous data, though it appears incremental as it builds on existing distributionally robust formulations with efficiency improvements.
The paper tackles the problem of decentralized learning with heterogeneous data distributions, which can lead to unsatisfactory performance for some devices, by proposing a distributionally robust decentralized learning algorithm (AD-GDA) that directly solves a minimax optimization problem. The result is an algorithm that provides unbiased predictors and greatly improves communication efficiency compared to existing methods, as corroborated by empirical findings.
Decentralized learning algorithms empower interconnected devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator. In the case of heterogeneous data distributions at the network nodes, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work, we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to directly solve the underlying minimax optimization problem. We render our algorithm communication-efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical results that highlight AD-GDA's ability to provide unbiased predictors and to greatly improve communication efficiency compared to existing distributionally robust algorithms.