IRLGMEApr 3, 2025

Counterfactual Inference under Thompson Sampling

arXiv:2504.08773v21 citationsh-index: 14RecSys
Originality Incremental advance
AI Analysis

This work addresses a practical challenge for researchers and practitioners in online decision-making fields like recommender systems and advertising, enabling more accurate causal inference, though it is incremental as it builds on existing Thompson sampling and off-policy estimation methods.

The paper tackled the problem of counterfactual inference under Thompson sampling in recommender systems, where existing estimators rely on action propensities that are not readily available, and derived exact and efficiently computable expressions for these propensities to enable unbiased offline evaluation and other applications.

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives. To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take. Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts. Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures. We derive exact and efficiently computable expressions for action propensities under a variety of parameter and outcome distributions, enabling the use of off-policy estimators in Thompson sampling scenarios. This opens up a range of practical use-cases where counterfactual inference is crucial, including unbiased offline evaluation of recommender systems, as well as general applications of causal inference in online advertising, personalisation, and beyond.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes