LG OC MLFeb 21, 2024

Dealing with unbounded gradients in stochastic saddle-point optimization

arXiv:2402.13903v215.012 citationsh-index: 28ICML

Originality Incremental advance

AI Analysis

This addresses a key challenge in optimization for convex-concave functions, with applications in reinforcement learning, though it appears incremental as it builds on existing regularization methods.

The paper tackles the problem of unbounded gradients causing instability in stochastic saddle-point optimization by proposing a regularization technique that stabilizes iterates and provides performance guarantees, even with linearly scaling gradient noise, and applies it to reinforcement learning for near-optimal policy guarantees in average-reward MDPs without bias span knowledge.

We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may result in instability and divergence. In this paper, we propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful performance guarantees even if the domain and the gradient noise scales linearly with the size of the iterates (and is thus potentially unbounded). Besides providing a set of general results, we also apply our algorithm to a specific problem in reinforcement learning, where it leads to performance guarantees for finding near-optimal policies in an average-reward MDP without prior knowledge of the bias span.

View on arXiv PDF

Similar