LGCCFeb 12, 2025

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

arXiv:2502.08632v14 citationsh-index: 11COLT
Originality Highly original
AI Analysis

This work addresses the foundational problem of understanding which supervised learning subroutines are essential for efficient RL, providing theoretical insights for algorithm design in large state spaces.

The paper clarifies the computational complexity of reinforcement learning by identifying minimal supervised learning oracles for different settings, showing that two-context regression is necessary and sufficient for reward-free exploration in Block MDPs, while one-context regression suffices with resets, and cryptographic evidence suggests insufficiency in Low-Rank MDPs.

Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning "oracles" it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a minimal oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model -- a ubiquitous setting for RL with function approximation -- we identify two-context regression as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes