LGAIMLFeb 22, 2024

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

arXiv:2402.14664v210 citationsh-index: 15AISTATS
Originality Incremental advance
AI Analysis

This work addresses a key challenge in interactive systems like recommendation engines, offering a more efficient approach for large-scale decision-making, though it appears incremental by building on existing Bayesian methods.

The paper tackles the problem of sample-efficient off-policy evaluation and learning in large action spaces by introducing a unified Bayesian framework that captures action correlations, resulting in strong empirical performance with improved computational efficiency.

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes