LGOCMLSep 1, 2022

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

arXiv:2209.00361v21 citationsh-index: 40
Originality Incremental advance
AI Analysis

This work addresses efficiency and versatility in optimization for machine learning practitioners, though it appears incremental as it builds on existing variance reduction methods with specific enhancements.

The paper tackles the problem of variance reduction in large-scale optimization by introducing SLEDGE, a single-loop algorithm that avoids periodic full gradient computations and achieves nearly optimal gradient complexity for finite-sum nonconvex problems, with applications in federated learning showing fewer communication rounds than existing methods under certain conditions.

While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-)~heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes