LG MLAug 3, 2019

Nonparametric Contextual Bandits in an Unknown Metric Space

Nirandika Wanigasekara, Christina Lee Yu

arXiv:1908.01228v18.110 citations

Originality Incremental advance

AI Analysis

This work addresses efficient learning in multi-arm bandit problems for applications like recommendation systems, though it appears incremental by building on existing nonparametric and metric-based methods.

The paper tackles the problem of nonparametric contextual bandits with many arms and unknown structure among reward functions, presenting a novel algorithm that learns arm similarities and adaptively partitions the context-arm space, achieving regret bounds and demonstrating performance dependent on local geometry in simulations.

Consider a nonparametric contextual multi-arm bandit problem where each arm $a \in [K]$ is associated to a nonparametric reward function $f_a: [0,1] \to \mathbb{R}$ mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions.

View on arXiv PDF

Similar