LGITSTOct 8, 2021

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

arXiv:2110.04127v12 citations
Originality Highly original
AI Analysis

This work addresses the challenge of filtering and prioritizing information for users in applications like recommendation systems, representing an incremental improvement by extending contextual bandits to non-linear settings with deep learning.

The paper tackles the problem of top-K ranking in contextual multi-armed bandits by dropping linearity assumptions and using deep neural networks to learn non-linear reward functions, introducing the Deep UCB algorithm that often outperforms other methods on real-world datasets but is sensitive to problem setup, with empirical results showing performance gains and theoretical regret bounds proving convergence to optimality for a weak class of problems.

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen iteratively to maximize a reward. The context, which represents a set of observable factors related to the user, is used to increase prediction accuracy compared to a standard multi-armed bandit. Contextual bandit methods have mostly been studied under strict linearity assumptions, but we drop that assumption and learn non-linear stochastic reward functions with deep neural networks. We introduce a novel algorithm called the Deep Upper Confidence Bound (UCB) algorithm. Deep UCB balances exploration and exploitation with a separate neural network to model the learning convergence. We compare the performance of many bandit algorithms varying K over real-world data sets with high-dimensional data and non-linear reward functions. Empirical results show that the performance of Deep UCB often outperforms though it is sensitive to the problem and reward setup. Additionally, we prove theoretical regret bounds on Deep UCB giving convergence to optimality for the weak class of CMAB problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes