ML LGApr 30, 2021

Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling

Simen Eide, David S. Leslie, Arnoldo Frigessi

arXiv:2104.15046v16.314 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of dynamic slate recommendation for users of internet platforms, offering a novel approach that moves beyond common assumptions and includes public dataset availability, though it is incremental in combining existing methods like RNNs and bandits.

The paper tackles the problem of recommending lists of items (slates) to users on internet platforms by introducing a variational Bayesian Recurrent Neural Net recommender system that scales to industrial settings, tested online and on a public dataset from FINN.no, showing that explorative strategies like in-slate Thompson Sampling perform on par or above greedy counterparts, with click rates increasing due to improved diversity.

We consider the problem of recommending relevant content to users of an internet platform in the form of lists of items, called slates. We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations. The recommender system is tested both online on real users, and on an offline dataset collected from a Norwegian web-based marketplace, FINN.no, that is made public for research. This is one of the first publicly available datasets which includes all the slates that are presented to users as well as which items (if any) in the slates were clicked on. Such a data set allows us to move beyond the common assumption that implicitly assumes that users are considering all possible items at each interaction. Instead we build our likelihood using the items that are actually in the slate, and evaluate the strengths and weaknesses of both approaches theoretically and in experiments. We also introduce a hierarchical prior for the item parameters based on group memberships. Both item parameters and user preferences are learned probabilistically. Furthermore, we combine our model with bandit strategies to ensure learning, and introduce `in-slate Thompson Sampling' which makes use of the slates to maximise explorative opportunities. We show experimentally that explorative recommender strategies perform on par or above their greedy counterparts. Even without making use of exploration to learn more effectively, click rates increase simply because of improved diversity in the recommended slates.

View on arXiv PDF Code

Similar