Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach
This addresses the need for more interactive and personalized music recommendation systems, though it is incremental in applying known methods to this domain.
The paper tackles the problem of greedy music recommendation by formulating it as a reinforcement learning task using a multi-armed bandit to balance exploration and exploitation, with results showing strong potential in simulations and a user study.
Current music recommender systems typically act in a greedy fashion by recommending songs with the highest user ratings. Greedy recommendation, however, is suboptimal over the long term: it does not actively gather information on user preferences and fails to recommend novel songs that are potentially interesting. A successful recommender system must balance the needs to explore user preferences and to exploit this information for recommendation. This paper presents a new approach to music recommendation by formulating this exploration-exploitation trade-off as a reinforcement learning task called the multi-armed bandit. To learn user preferences, it uses a Bayesian model, which accounts for both audio content and the novelty of recommendations. A piecewise-linear approximation to the model and a variational inference algorithm are employed to speed up Bayesian inference. One additional benefit of our approach is a single unified model for both music recommendation and playlist generation. Both simulation results and a user study indicate strong potential for the new approach.