LG MLJan 5, 2018

Nonparametric Stochastic Contextual Bandits

arXiv:1801.01750v111.439 citations

Originality Incremental advance

AI Analysis

This work addresses contextual bandit problems for applications like recommendation systems, offering incremental advances with new regret bounds and algorithm extensions.

The paper tackles the K-armed bandit problem with contextual rewards under nonparametric assumptions, achieving a sublinear regret of O~(T^{(1+D)/(2+D)}) for a modified UCB algorithm and demonstrating experimental improvements over existing methods on simulated tasks and MNIST.

We analyze the $K$-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of $\widetilde{O}\Big(T^{\frac{1+D}{2+D}}\Big)$, where $D$ is the context dimension, for a modified UCB algorithm that is simple to implement ($k$NN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.

View on arXiv PDF

Similar