LG CY IRJan 12, 2021

Measuring Recommender System Effects with Simulated Users

Sirui Yao, Yoni Halpern, Nithum Thain, Xuezhi Wang, Kang Lee, Flavien Prost, Ed H. Chi, Jilin Chen, Alex Beutel

arXiv:2101.04526v116.460 citations

Originality Incremental advance

AI Analysis

This addresses the need for better evaluation methods in recommender systems to understand long-term biases and effects on users, though it is incremental by building on prior bias research with a new simulation approach.

The paper tackles the problem of measuring the causal effects of recommender systems on user behavior over time, such as fostering unhealthy habits, by developing a simulation framework that isolates system impact from user preferences and examines extreme experiences, with empirical case studies on MovieLens and a production system showing how popularity bias manifests.

Imagine a food recommender system -- how would we check if it is \emph{causing} and fostering unhealthy eating habits or merely reflecting users' interests? How much of a user's experience over time with a recommender is caused by the recommender system's choices and biases, and how much is based on the user's preferences and biases? Popularity bias and filter bubbles are two of the most well-studied recommender system biases, but most of the prior research has focused on understanding the system behavior in a single recommendation step. How do these biases interplay with user behavior, and what types of user experiences are created from repeated interactions? In this work, we offer a simulation framework for measuring the impact of a recommender system under different types of user behavior. Using this simulation framework, we can (a) isolate the effect of the recommender system from the user preferences, and (b) examine how the system performs not just on average for an "average user" but also the extreme experiences under atypical user behavior. As part of the simulation framework, we propose a set of evaluation metrics over the simulations to understand the recommender system's behavior. Finally, we present two empirical case studies -- one on traditional collaborative filtering in MovieLens and one on a large-scale production recommender system -- to understand how popularity bias manifests over time.

View on arXiv PDF

Similar