LG OC MLMay 30, 2023

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi

arXiv:2305.19259v412.38 citations

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap in optimization for machine learning practitioners, providing enhanced convergence guarantees for widely used algorithms, though it appears incremental in nature.

The paper tackles the theoretical convergence properties of incremental gradient and shuffle SGD algorithms for non-convex smooth functions, showing improved convergence guarantees that reduce the optimization term from O(n/ε) to O(1/ε) to reach accuracy ε, where n is the training set size.

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\varepsilon$ from $O(n / \varepsilon)$ to $O(1 / \varepsilon)$.

View on arXiv PDF

Similar