LGOCMLMay 21, 2020

Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

arXiv:2005.10804v389 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical gap in reinforcement learning for researchers and practitioners by providing a model-free framework that generalizes beyond linear approximations, though it is incremental in extending existing linear theory to more general settings.

The paper tackles the problem of reinforcement learning with general value function approximation by establishing a provably efficient algorithm that achieves a regret bound of Õ(poly(dH)√T), where d is a complexity measure based on eluder dimension and log-covering numbers, H is the planning horizon, and T is the number of interactions.

Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding of general function approximation schemes largely remains missing. In this paper, we establish a provably efficient RL algorithm with general value function approximation. We show that if the value functions admit an approximation with a function class $\mathcal{F}$, our algorithm achieves a regret bound of $\widetilde{O}(\mathrm{poly}(dH)\sqrt{T})$ where $d$ is a complexity measure of $\mathcal{F}$ that depends on the eluder dimension [Russo and Van Roy, 2013] and log-covering numbers, $H$ is the planning horizon, and $T$ is the number interactions with the environment. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment. Moreover, our algorithm is model-free and provides a framework to justify the effectiveness of algorithms used in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes