LGSep 3, 2023

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

Jan Malte Lichtenberg, Alexander Buchholz, Giuseppe Di Benedetto, Matteo Ruffini, Ben London

arXiv:2309.01120v17.74 citations

Originality Incremental advance

AI Analysis

This work addresses bias issues in off-policy evaluation for reinforcement learning, offering an incremental improvement over existing clipping techniques.

The paper tackles the problem of bias in off-policy evaluation by proposing double clipping, a method that compensates for the downward bias introduced by clipping while maintaining variance reduction, resulting in improved bias-variance trade-offs.

"Clipping" (a.k.a. importance weight truncation) is a widely used variance-reduction technique for counterfactual off-policy estimators. Like other variance-reduction techniques, clipping reduces variance at the cost of increased bias. However, unlike other techniques, the bias introduced by clipping is always a downward bias (assuming non-negative rewards), yielding a lower bound on the true expected reward. In this work we propose a simple extension, called $\textit{double clipping}$, which aims to compensate this downward bias and thus reduce the overall bias, while maintaining the variance reduction properties of the original estimator.

View on arXiv PDF

Similar