LGDCMLMay 24, 2018

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning

arXiv:1805.09470v217 citations
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning for distributed and asynchronous optimization settings, providing theoretical guarantees for non-convex problems with unbounded delays, which is an incremental advance over prior bounded or convex-focused analyses.

The paper tackles the convergence of asynchronous stochastic gradient descent (Async-SGD) with unbounded delays in non-convex optimization, proving convergence rates of o(1/√k) for Async-SGD and o(1/k) for a variant with increasing batch size, and establishes a unifying sufficient condition that includes existing delay models and introduces a new one.

Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning. To date, however, most of the existing works are restricted to either bounded gradient delays or convex settings. In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays. We prove $o(1/\sqrt{k})$ convergence rate for Async-SGD and $o(1/k)$ for Async-SGDI. Also, a unifying sufficient condition for Async-SGD's convergence is established, which includes two major gradient delay models in the literature as special cases and yields a new delay model not considered thus far.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes