Alternating Linear Bandits for Online Matrix-Factorization Recommendation
This work addresses the challenge of real-time item recommendation for users in online platforms, representing an incremental improvement over existing methods.
The paper tackles the problem of online collaborative filtering by proposing a novel algorithm that combines linear bandits and alternating least squares for matrix factorization recommendation, achieving superior performance over state-of-the-art methods in terms of cumulative regret and average cumulative NDCG on synthetic and real-world datasets.
We consider the problem of online collaborative filtering in the online setting, where items are recommended to the users over time. At each time step, the user (selected by the environment) consumes an item (selected by the agent) and provides a rating of the selected item. In this paper, we propose a novel algorithm for online matrix factorization recommendation that combines linear bandits and alternating least squares. In this formulation, the bandit feedback is equal to the difference between the ratings of the best and selected items. We evaluate the performance of the proposed algorithm over time using both cumulative regret and average cumulative NDCG. Simulation results over three synthetic datasets as well as three real-world datasets for online collaborative filtering indicate the superior performance of the proposed algorithm over two state-of-the-art online algorithms.