LGOCMLFeb 21, 2024

Dealing with unbounded gradients in stochastic saddle-point optimization

arXiv:2402.13903v212 citationsh-index: 28ICML
Originality Incremental advance
AI Analysis

This addresses a key challenge in optimization for convex-concave functions, with applications in reinforcement learning, though it appears incremental as it builds on existing regularization methods.

The paper tackles the problem of unbounded gradients causing instability in stochastic saddle-point optimization by proposing a regularization technique that stabilizes iterates and provides performance guarantees, even with linearly scaling gradient noise, and applies it to reinforcement learning for near-optimal policy guarantees in average-reward MDPs without bias span knowledge.

We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may result in instability and divergence. In this paper, we propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful performance guarantees even if the domain and the gradient noise scales linearly with the size of the iterates (and is thus potentially unbounded). Besides providing a set of general results, we also apply our algorithm to a specific problem in reinforcement learning, where it leads to performance guarantees for finding near-optimal policies in an average-reward MDP without prior knowledge of the bias span.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes