LGAIApr 16, 2025

A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs

arXiv:2504.11997v12 citationsh-index: 54ICML
Originality Incremental advance
AI Analysis

This addresses a computational bottleneck for researchers and practitioners in reinforcement learning, though it is incremental as it builds on prior work.

The paper tackles the computational inefficiency of clipping in value iteration for infinite-horizon average-reward linear MDPs by introducing a method that computes minima only over visited states, achieving the same regret bound with state-space-independent complexity.

We study reinforcement learning in infinite-horizon average-reward settings with linear MDPs. Previous work addresses this problem by approximating the average-reward setting by discounted setting and employing a value iteration-based algorithm that uses clipping to constrain the span of the value function for improved statistical efficiency. However, the clipping procedure requires computing the minimum of the value function over the entire state space, which is prohibitive since the state space in linear MDP setting can be large or even infinite. In this paper, we introduce a value iteration method with efficient clipping operation that only requires computing the minimum of value functions over the set of states visited by the algorithm. Our algorithm enjoys the same regret bound as the previous work while being computationally efficient, with computational complexity that is independent of the size of the state space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes