Deviation-Based Learning: Training Recommender Systems Using Informed User Choice
This addresses a key bottleneck in recommender systems for users and platforms by preventing learning stalls, though it appears incremental as it builds on existing rational user models.
The paper tackles the problem of recommender systems stalling in learning user preferences when users blindly follow recommendations, by proposing deviation-based learning where the system abstains from recommending when alternatives have similar payoffs, resulting in improved learning rates and social welfare.
This paper proposes a new approach to training recommender systems called deviation-based learning. The recommender and rational users have different knowledge. The recommender learns user knowledge by observing what action users take upon receiving recommendations. Learning eventually stalls if the recommender always suggests a choice: Before the recommender completes learning, users start following the recommendations blindly, and their choices do not reflect their knowledge. The learning rate and social welfare improve substantially if the recommender abstains from recommending a particular choice when she predicts that multiple alternatives will produce a similar payoff.