LGDCOCFeb 13, 2023

FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging

arXiv:2302.06103v13 citationsh-index: 41
AI Analysis

This work addresses the need for efficient adaptive gradient methods in federated learning, offering a novel framework that is incremental but achieves state-of-the-art complexity bounds.

The authors tackled the problem of incorporating adaptive gradient methods into federated learning, proposing FedDA, a framework that achieves gradient complexity of O(ε^{-1.5}) and communication complexity of O(ε^{-1}) for finding a stationary point, matching the best known rates for first-order FL algorithms.

Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. In Federated Learning, a set of clients jointly perform a machine learning task under the coordination of a server. The FedAvg algorithm is one of the most widely used methods to solve Federated Learning problems. In FedAvg, the learning rate is a constant rather than changing adaptively. The adaptive gradient methods show superior performance over the constant learning rate schedule; however, there is still no general framework to incorporate adaptive gradient methods into the federated setting. In this paper, we propose \textbf{FedDA}, a novel framework for local adaptive gradient methods. The framework adopts a restarted dual averaging technique and is flexible with various gradient estimation methods and adaptive learning rate formulations. In particular, we analyze \textbf{FedDA-MVR}, an instantiation of our framework, and show that it achieves gradient complexity $\tilde{O}(ε^{-1.5})$ and communication complexity $\tilde{O}(ε^{-1})$ for finding a stationary point $ε$. This matches the best known rate for first-order FL algorithms and \textbf{FedDA-MVR} is the first adaptive FL algorithm that achieves this rate. We also perform extensive numerical experiments to verify the efficacy of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes