AI OCJun 1

Stochastic convergence of parallel asynchronous adaptive first-order methods

arXiv:2606.0178712.8

AI Analysis

Provides theoretical convergence guarantees for asynchronous adaptive optimizers, addressing a key bottleneck in distributed training for large-scale machine learning.

The paper introduces asynchronous adaptive first-order optimization methods, including variants with momentum and inexact normalization, and proves their convergence rate of O(1/√t) (up to log factors) on non-convex functions in a stochastic setting. Numerical experiments show relevance for heterogeneous large-scale ML systems.

A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic factors) of order O(1/sqrt{t}) under reasonable assumptions. Numerical experiments suggest that such asynchronous adaptive algorithms are very relevant in heterogeneous large-scale machine learning systems.

View on arXiv PDF

Similar