Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting
This work solves the challenge of requiring Bellman completeness for off-policy evaluation in reinforcement learning, offering a practical improvement for researchers and practitioners in the field.
The paper tackled the problem of off-policy evaluation in reinforcement learning by addressing the Bellman completeness requirement in Fitted Q-evaluation (FQE), proposing a method that uses stationary weighting to align with the contraction norm, which avoids geometric error blow-up and maintains practicality.
Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under the evaluation Bellman operator. This requirement is challenging because enlarging the hypothesis class can worsen completeness. We show that the need for this assumption stems from a fundamental norm mismatch: the Bellman operator is gamma-contractive under the stationary distribution of the target policy, whereas FQE minimizes Bellman error under the behavior distribution. We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts. This enables strong evaluation guarantees in the absence of realizability or Bellman completeness, avoiding the geometric error blow-up of standard FQE in this setting while maintaining the practicality of regression-based evaluation.