LGSep 3, 2023

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

arXiv:2309.01120v14 citations
Originality Incremental advance
AI Analysis

This work addresses bias issues in off-policy evaluation for reinforcement learning, offering an incremental improvement over existing clipping techniques.

The paper tackles the problem of bias in off-policy evaluation by proposing double clipping, a method that compensates for the downward bias introduced by clipping while maintaining variance reduction, resulting in improved bias-variance trade-offs.

"Clipping" (a.k.a. importance weight truncation) is a widely used variance-reduction technique for counterfactual off-policy estimators. Like other variance-reduction techniques, clipping reduces variance at the cost of increased bias. However, unlike other techniques, the bias introduced by clipping is always a downward bias (assuming non-negative rewards), yielding a lower bound on the true expected reward. In this work we propose a simple extension, called $\textit{double clipping}$, which aims to compensate this downward bias and thus reduce the overall bias, while maintaining the variance reduction properties of the original estimator.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes