IRLGMay 13, 2021

A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

arXiv:2105.06275v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses a practical issue for streaming services by providing an incremental evaluation method to better assess recommendation quality in real-world interfaces.

The paper tackles the problem of evaluating recommender systems in multi-carousel user interfaces, where traditional offline protocols fail to account for complementarity between carousels, and finds that algorithm rankings change in this setting, with matrix factorization models preferred when a SLIM carousel is available.

Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficient strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes