LGOCFeb 17, 2022

Delay-adaptive step-sizes for asynchronous learning

arXiv:2202.08550v318 citations
Originality Incremental advance
AI Analysis

This addresses inefficiencies in scalable ML training for systems with asynchronous nodes, though it is incremental as it builds on existing methods.

The paper tackles the problem of slow convergence in asynchronous parallel machine learning systems by proposing learning rates that adapt to actual time-varying delays instead of using worst-case bounds. It demonstrates theoretical and practical advantages for proximal incremental gradient descent and block-coordinate descent algorithms.

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes