Study of a bias in the offline evaluation of a recommendation algorithm
This addresses a methodological problem for researchers and practitioners in recommendation systems, but it is incremental as it builds on existing evaluation critiques.
The paper identifies a bias in offline evaluation of recommendation algorithms caused by user interactions influenced by the systems themselves, and proposes a weighted offline evaluation method to reduce this bias across different algorithm classes.
Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the evaluation of the performance of a recommendation algorithm computed using historical data (via offline evaluation). This paper describes this bias and discuss the relevance of a weighted offline evaluation to reduce this bias for different classes of recommendation algorithms.