LGMLMay 30, 2022

Mixed-Effect Thompson Sampling

arXiv:2205.15124v315 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses the challenge of online learning in large action spaces with correlations, offering a framework that could improve efficiency in applications like recommendation systems, though it appears incremental with extensions lacking guarantees.

The paper tackles the problem of efficient exploration in contextual bandits with many correlated actions by introducing a mixed-effect model and proposing Mixed-Effect Thompson Sampling (meTS), bounding its Bayes regret with terms reflecting model structure and prior quality, and validating results empirically.

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we propose Mixed-Effect Thompson Sampling (meTS) and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform well empirically and show the generality of the proposed framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes