LG AI MLSep 30, 2022

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Zixiang Chen, Chris Junchi Li, Angela Yuan, Quanquan Gu, Michael I. Jordan

arXiv:2209.15634v120.231 citationsh-index: 64

Originality Highly original

AI Analysis

This work addresses the challenge of sample efficiency in reinforcement learning for applications with large-scale environments, representing an incremental advancement through a unifying framework.

The authors tackled the problem of sample-efficient reinforcement learning with large state and action spaces by proposing a general framework that unifies model-based and model-free RL, introducing the OPERA algorithm which achieves regret bounds that match or improve over best-known results, including improving state-of-the-art sample complexity by a factor of dH for MDPs with low Witness rank.

With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL). In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov Decision Process (MDP) models in the literature for tractable RL. We propose a novel estimation function with decomposable structural properties for optimization-based exploration and the functional eluder dimension as a complexity measure of the ABC class. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed, achieving regret bounds that match or improve over the best-known results for a variety of MDP models. In particular, for MDPs with low Witness rank, under a slightly stronger assumption, OPERA improves the state-of-the-art sample complexity results by a factor of $dH$. Our framework provides a generic interface to design and analyze new RL models and algorithms.

View on arXiv PDF

Similar