LGAIMay 22, 2022

Offline Policy Comparison with Confidence: Benchmarks and Baselines

arXiv:2205.10739v1h-index: 46
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable confidence estimates in offline policy comparison, which is crucial for decision-makers using historical data, though it is incremental as it builds on existing offline RL datasets and methods.

The paper tackled the problem of evaluating confidence values for offline policy comparison by creating benchmarks (OPCC) from offline reinforcement learning datasets and empirically assessing model-based baselines using dynamics model ensembles. The results indicated advantages for some baseline variations but significant room for improvement.

Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, computational tools should produce confidence values for such offline policy comparison (OPC) to account for statistical variance and limited data coverage. Nevertheless, there is little work that directly evaluates the quality of confidence values for OPC. In this work, we address this issue by creating benchmarks for OPC with Confidence (OPCC), derived by adding sets of policy comparison queries to datasets from offline reinforcement learning. In addition, we present an empirical evaluation of the risk versus coverage trade-off for a class of model-based baselines. In particular, the baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggest advantages for certain baseline variations, there appears to be significant room for improvement in future work.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes