Scalable Neural Contextual Bandit for Recommender Systems
This work addresses the problem of high computational demands in neural contextual bandit algorithms for real-world recommender systems, enabling more efficient exploration of user preferences.
The paper tackles the computational inefficiency of neural contextual bandit algorithms in recommender systems by proposing a scalable sample-efficient algorithm, which boosts click-through rates by at least 9% and user ratings by at least 6% while reducing computational resources and user interactions by at least 29% compared to state-of-the-art baselines.
High-quality recommender systems ought to deliver both innovative and relevant content through effective and exploratory interactions with users. Yet, supervised learning-based neural networks, which form the backbone of many existing recommender systems, only leverage recognized user interests, falling short when it comes to efficiently uncovering unknown user preferences. While there has been some progress with neural contextual bandit algorithms towards enabling online exploration through neural networks, their onerous computational demands hinder widespread adoption in real-world recommender systems. In this work, we propose a scalable sample-efficient neural contextual bandit algorithm for recommender systems. To do this, we design an epistemic neural network architecture, Epistemic Neural Recommendation (ENR), that enables Thompson sampling at a large scale. In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves equivalent performance with at least 29% fewer user interactions compared to the best-performing baseline algorithm. Remarkably, while accomplishing these improvements, ENR demands orders of magnitude fewer computational resources than neural contextual bandit baseline algorithms.