Effective Federated Adaptive Gradient Methods with Non-IID Decentralized Data
This work addresses performance deterioration in federated learning for edge devices with heterogeneous data, offering incremental improvements over existing methods.
The authors tackled the problem of federated learning with non-IID and unbalanced data by proposing Federated AGMs, which adaptively adjust learning rates using first- and second-order momenta, and they demonstrated improved convergence and test performance compared to state-of-the-art methods like FedAvg.
Federated learning allows loads of edge computing devices to collaboratively learn a global model without data sharing. The analysis with partial device participation under non-IID and unbalanced data reflects more reality. In this work, we propose federated learning versions of adaptive gradient methods - Federated AGMs - which employ both the first-order and second-order momenta, to alleviate generalization performance deterioration caused by dissimilarity of data population among devices. To further improve the test performance, we compare several schemes of calibration for the adaptive learning rate, including the standard Adam calibrated by $ε$, $p$-Adam, and one calibrated by an activation function. Our analysis provides the first set of theoretical results that the proposed (calibrated) Federated AGMs converge to a first-order stationary point under non-IID and unbalanced data settings for nonconvex optimization. We perform extensive experiments to compare these federated learning methods with the state-of-the-art FedAvg, FedMomentum and SCAFFOLD and to assess the different calibration schemes and the advantages of AGMs over the current federated learning methods.