LGFeb 14, 2022

On the Convergence of SARSA with Linear Function Approximation

arXiv:2202.06828v319 citations
Originality Incremental advance
AI Analysis

This addresses a theoretical gap in reinforcement learning for researchers, providing new insights into SARSA's convergence under broader conditions.

The paper tackles the problem of SARSA's oscillatory behavior with linear function approximation by analyzing its convergence rate to a bounded region, showing the region is smaller than the projection region when rewards are not too large.

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes