MLLGSTMEDec 13, 2022

A Review of Off-Policy Evaluation in Reinforcement Learning

Harvard
arXiv:2212.06355v1122 citationsh-index: 37
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers working on fundamental RL topics, but is incremental as it synthesizes existing literature.

This paper reviews off-policy evaluation (OPE) in reinforcement learning, discussing efficiency bounds, state-of-the-art methods, and their statistical properties.

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes