LGAISep 7, 2024

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions

arXiv:2409.04840v23 citationsh-index: 13
AI Analysis

This addresses the problem of computational inefficiency in RL for large or infinite state-action spaces, offering a clear improvement over prior methods that had exponential costs, though it is incremental in advancing specific settings.

The paper tackles the challenge of designing sample-efficient and computationally feasible reinforcement learning algorithms for Markov Decision Processes with linearly-realizable value functions, presenting an algorithm that finds a near-optimal policy using polynomial episodes and oracle calls, with efficient implementation for constant feature dimensions.

Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting an efficient algorithm for Markov Decision Processes (MDPs) where the state-action value function of any policy is linear in a given feature map. This challenging setting can model environments with infinite states and actions, strictly generalizes classic linear MDPs, and currently lacks a computationally efficient algorithm under online access to the MDP. Specifically, we introduce a new RL algorithm that efficiently finds a near-optimal policy in this setting, using a number of episodes and calls to a cost-sensitive classification (CSC) oracle that are both polynomial in the problem parameters. Notably, our CSC oracle can be efficiently implemented when the feature dimension is constant, representing a clear improvement over state-of-the-art methods, which require solving non-convex problems with horizon-many variables and can incur computational costs that are exponential in the horizon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes