Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization
This work addresses efficiency issues in distributed machine learning for applications like meta learning and robust learning, though it is incremental as it builds on existing momentum and local-SGD techniques.
The paper tackles the problem of high sample and communication complexities in federated composition optimization by proposing faster algorithms (MFCGD and AdaMFCGD), achieving lower sample complexity of $ ilde{O}(ε^{-3})$ and communication complexity of $ ilde{O}(ε^{-2})$ for finding an $ε$-stationary solution.
Federated Learning is a popular distributed learning paradigm in machine learning. Meanwhile, composition optimization is an effective hierarchical learning model, which appears in many machine learning applications such as meta learning and robust learning. More recently, although a few federated composition optimization algorithms have been proposed, they still suffer from high sample and communication complexities. In the paper, thus, we propose a class of faster federated compositional optimization algorithms (i.e., MFCGD and AdaMFCGD) to solve the nonconvex distributed composition problems, which builds on the momentum-based variance reduced and local-SGD techniques. In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates. Moreover, we provide a solid theoretical analysis for our algorithms under non-i.i.d. setting, and prove our algorithms obtain a lower sample and communication complexities simultaneously than the existing federated compositional algorithms. Specifically, our algorithms obtain lower sample complexity of $\tilde{O}(ε^{-3})$ with lower communication complexity of $\tilde{O}(ε^{-2})$ in finding an $ε$-stationary solution. We conduct the numerical experiments on robust federated learning and distributed meta learning tasks to demonstrate the efficiency of our algorithms.