LGCCCRDSApr 4, 2024

Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning

arXiv:2404.03774v18 citationsh-index: 41FOCS
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning theory by demonstrating a computational separation that challenges assumptions about the ease of extending supervised learning methods to RL, with implications for algorithm design and complexity analysis.

The paper tackles the computational difficulty of reinforcement learning compared to supervised learning by providing the first cryptographic separation, showing that reward-free exploration in block MDPs is provably harder than regression, and that regression is necessary but not sufficient for efficient RL.

Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free exploration is provably computationally harder than the associated regression problem. We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem. It is known that being able to perform regression in block MDPs is necessary for finding a good policy; our results suggest that it is not sufficient. Our separation lower bound uses a new robustness property of the Learning Parities with Noise (LPN) hardness assumption, which is crucial in handling the dependent nature of RL data. We argue that separations and oracle lower bounds, such as ours, are a more meaningful way to prove hardness of learning because the constructions better reflect the practical reality that supervised learning by itself is often not the computational bottleneck.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes