Convolutional Neural Bandit for Visual-aware Recommendation
This addresses the exploration-exploitation dilemma in online recommendation and advertising for businesses using image displays, representing an incremental improvement by integrating CNNs into bandit algorithms.
The paper tackles the problem of visual-aware recommendation by proposing a contextual bandit algorithm that uses a convolutional neural network to learn reward functions with an upper confidence bound for exploration, achieving a near-optimal regret bound of $ ilde{\mathcal{O}}(\sqrt{T})$ and outperforming state-of-the-art UCB-based algorithms on real-world image datasets.
Online recommendation/advertising is ubiquitous in web business. Image displaying is considered as one of the most commonly used formats to interact with customers. Contextual multi-armed bandit has shown success in the application of advertising to solve the exploration-exploitation dilemma existing in the recommendation procedure. Inspired by the visual-aware recommendation, in this paper, we propose a contextual bandit algorithm, where the convolutional neural network (CNN) is utilized to learn the reward function along with an upper confidence bound (UCB) for exploration. We also prove a near-optimal regret bound $\tilde{\mathcal{O}}(\sqrt{T})$ when the network is over-parameterized, and establish strong connections with convolutional neural tangent kernel (CNTK). Finally, we evaluate the empirical performance of the proposed algorithm and show that it outperforms other state-of-the-art UCB-based bandit algorithms on real-world image data sets.