LGNASTMLOct 25, 2021

Operator Shifting for Model-based Policy Evaluation

arXiv:2110.12658v3
Originality Incremental advance
AI Analysis

This addresses bias reduction in model-based policy evaluation for reinforcement learning practitioners, but appears incremental as it builds on existing estimation methods.

The paper tackles the bias in value function estimation from noisy model-based reinforcement learning by introducing an operator shifting method, proving that the shifting factor is positive and bounded by 1+O(1/n) for residual norm error, and proposing a practical algorithm for implementation.

In model-based reinforcement learning, the transition matrix and reward vector are often estimated from random samples subject to noise. Even if the estimated model is an unbiased estimate of the true underlying model, the value function computed from the estimated model is biased. We introduce an operator shifting method for reducing the error introduced by the estimated model. When the error is in the residual norm, we prove that the shifting factor is always positive and upper bounded by $1+O\left(1/n\right)$, where $n$ is the number of samples used in learning each row of the transition matrix. We also propose a practical numerical algorithm for implementing the operator shifting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes