LGAIDec 14, 2020

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

arXiv:2012.08005v475 citations
AI Analysis

This work identifies a fundamental limitation of batch reinforcement learning, showing that it can be exponentially harder than online learning, which is a significant problem for practitioners relying on offline data.

This paper demonstrates that batch reinforcement learning can be exponentially harder than online RL. It establishes exponential information-theoretic lower bounds for identifying a near-optimal policy and estimating a target policy's value in discounted infinite horizon MDPs with linear function representation, even under ideal conditions such as realizability and access to exact reward and transition functions.

Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive \emph{exponential} information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) \emph{realizability} holds, 2) the batch algorithm observes the exact reward and transition \emph{functions}, and 3) the batch algorithm is given the \emph{best} a priori data distribution for the problem class. Our work introduces a new `oracle + batch algorithm' framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes