LG IR MLSep 28, 2020

Position-Based Multiple-Play Bandits with Thompson Sampling

Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont

arXiv:2009.13181v32.31 citations

Originality Incremental advance

AI Analysis

This addresses the problem of improving recommendation accuracy in web-based systems for users and platforms, but it is incremental as it builds on existing bandit and Thompson sampling methods.

The paper tackled the problem of displaying relevant items at relevant positions in online recommender systems by introducing a new bandit-based algorithm, PB-MHB, that uses Thompson sampling and handles a position-based model without requiring user look probabilities. Experiments on simulated and real datasets showed that this method delivers better recommendations than state-of-the-art algorithms with fewer prior information.

Multiple-play bandits aim at displaying relevant items at relevant positions on a web page. We introduce a new bandit-based algorithm, PB-MHB, for online recommender systems which uses the Thompson sampling framework. This algorithm handles a display setting governed by the position-based model. Our sampling method does not require as input the probability of a user to look at a given position in the web page which is, in practice, very difficult to obtain. Experiments on simulated and real datasets show that our method, with fewer prior information, deliver better recommendations than state-of-the-art algorithms.

View on arXiv PDF

Similar