LG AIDec 1, 2020

Non-Stationary Latent Bandits

Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

arXiv:2012.00386v19.017 citations

Originality Incremental advance

AI Analysis

This work aims to improve personalization for recommender system users whose preferences change over time, offering a practical approach for faster adaptation.

This paper addresses the problem of non-stationary user behavior in recommender systems by framing it as a non-stationary latent bandit problem. The authors propose Thompson sampling algorithms for regret minimization, which learn prototypical user behavior models offline and infer the user's latent state online, combining offline and online learning strengths.

Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models. We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset. The main strength of our approach is that it can be combined with rich offline-learned models, which can be misspecified, and are subsequently fine-tuned online using posterior sampling. In this way, we naturally combine the strengths of offline and online learning.

View on arXiv PDF

Similar