LGMay 24, 2024

Cross-Validated Off-Policy Evaluation

Matej Cief, Branislav Kveton, Michal Kompan

arXiv:2405.15332v46.42 citationsh-index: 37Has CodeAAAI

Originality Synthesis-oriented

AI Analysis

This provides a practical solution for practitioners in reinforcement learning and policy evaluation, though it appears incremental as it adapts an existing method to a new context.

The paper tackles the problem of estimator selection and hyper-parameter tuning in off-policy evaluation by demonstrating how to use cross-validation, challenging the belief that it is not feasible, and shows empirical evaluation addressing various use cases.

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.

View on arXiv PDF Code

Similar