Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee
This addresses a practical issue in reinforcement learning for real-life applications where hyperparameter tuning undermines utility, though it is incremental as it builds on existing FQE methods.
The paper tackles the problem of hyperparameter selection in fitted Q-evaluation (FQE) for offline policy evaluation, proposing an approximate hyperparameter selection (AHS) framework with interpretable criteria and deriving four methods that show theoretical error bounds matching empirical results.
We are concerned with the problem of hyperparameter selection for the fitted Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline policy evaluation (OPE), which is essential to the reinforcement learning without environment simulators. However, like other OPE methods, FQE is not hyperparameter-free itself and that undermines the utility in real-life applications. We address this issue by proposing a framework of approximate hyperparameter selection (AHS) for FQE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as distribution-mismatch tolerance and time complexity. We also confirm in experiments that the error bound given by the theory matches empirical observations.