ML LG ST MEDec 13, 2022

A Review of Off-Policy Evaluation in Reinforcement Learning

Masatoshi Uehara, Chengchun Shi, Nathan Kallus

Harvard

arXiv:2212.06355v133.6122 citationsh-index: 37

Originality Synthesis-oriented

AI Analysis

It provides a comprehensive overview for researchers working on fundamental RL topics, but is incremental as it synthesizes existing literature.

This paper reviews off-policy evaluation (OPE) in reinforcement learning, discussing efficiency bounds, state-of-the-art methods, and their statistical properties.

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

View on arXiv PDF

Similar