MLLGOCJun 25, 2021

Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems

arXiv:2106.13781v141 citations
Originality Highly original
AI Analysis

This work addresses a foundational bottleneck in machine learning for stochastic nested problems like compositional and min-max optimization, offering a unified theoretical explanation for practical performance without incremental modifications.

The paper tackles the slower convergence of SGD-type methods in stochastic nested optimization problems by unifying them into a single ALSET method and providing a tighter analysis, showing it requires O(ε^{-2}) samples to achieve an ε-stationary point, which matches or improves best-known sample complexities in specific cases.

Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them separately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $ε$-stationary point of the nested problem, it requires ${\cal O}(ε^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes