LGMLJan 5, 2018

Nonparametric Stochastic Contextual Bandits

arXiv:1801.01750v139 citations
Originality Incremental advance
AI Analysis

This work addresses contextual bandit problems for applications like recommendation systems, offering incremental advances with new regret bounds and algorithm extensions.

The paper tackles the K-armed bandit problem with contextual rewards under nonparametric assumptions, achieving a sublinear regret of O~(T^{(1+D)/(2+D)}) for a modified UCB algorithm and demonstrating experimental improvements over existing methods on simulated tasks and MNIST.

We analyze the $K$-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of $\widetilde{O}\Big(T^{\frac{1+D}{2+D}}\Big)$, where $D$ is the context dimension, for a modified UCB algorithm that is simple to implement ($k$NN-UCB). We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing multi-armed bandit approaches for both simulated tasks and MNIST image classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes