LGAIROMLNov 15, 2019

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

arXiv:1911.06854v3175 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for standardized empirical analyses in safety-critical applications, though it is incremental as it focuses on benchmarking existing methods.

The paper tackles the problem of off-policy policy evaluation in reinforcement learning by providing an experimental benchmark and empirical study, resulting in a comprehensive benchmarking suite and summarized guidelines for practical use.

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes