LGAIMLFeb 28, 2025

Clustering Context in Off-Policy Evaluation

arXiv:2502.21304v12 citationsh-index: 7AISTATS
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in off-policy evaluation for applications like e-commerce and healthcare, offering an incremental improvement over existing methods.

The paper tackles the problem of deteriorating performance in off-policy evaluation when logging and evaluation policies differ significantly, by proposing an estimator that clusters similar contexts, and experimental results show it improves estimation accuracy, particularly in deficient information settings.

Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes