LG AIMay 22, 2022

Offline Policy Comparison with Confidence: Benchmarks and Baselines

Anurag Koul, Mariano Phielipp, Alan Fern

arXiv:2205.10739v11.8h-index: 46Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable confidence estimates in offline policy comparison, which is crucial for decision-makers using historical data, though it is incremental as it builds on existing offline RL datasets and methods.

The paper tackled the problem of evaluating confidence values for offline policy comparison by creating benchmarks (OPCC) from offline reinforcement learning datasets and empirically assessing model-based baselines using dynamics model ensembles. The results indicated advantages for some baseline variations but significant room for improvement.

Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, computational tools should produce confidence values for such offline policy comparison (OPC) to account for statistical variance and limited data coverage. Nevertheless, there is little work that directly evaluates the quality of confidence values for OPC. In this work, we address this issue by creating benchmarks for OPC with Confidence (OPCC), derived by adding sets of policy comparison queries to datasets from offline reinforcement learning. In addition, we present an empirical evaluation of the risk versus coverage trade-off for a class of model-based baselines. In particular, the baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggest advantages for certain baseline variations, there appears to be significant room for improvement in future work.

View on arXiv PDF Code

Similar