LGMay 24, 2024

Cross-Validated Off-Policy Evaluation

arXiv:2405.15332v42 citationsh-index: 37AAAI
Originality Synthesis-oriented
AI Analysis

This provides a practical solution for practitioners in reinforcement learning and policy evaluation, though it appears incremental as it adapts an existing method to a new context.

The paper tackles the problem of estimator selection and hyper-parameter tuning in off-policy evaluation by demonstrating how to use cross-validation, challenging the belief that it is not feasible, and shows empirical evaluation addressing various use cases.

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes