LGMLAug 3, 2019

Nonparametric Contextual Bandits in an Unknown Metric Space

arXiv:1908.01228v110 citations
AI Analysis

This work addresses efficient learning in multi-arm bandit problems for applications like recommendation systems, though it appears incremental by building on existing nonparametric and metric-based methods.

The paper tackles the problem of nonparametric contextual bandits with many arms and unknown structure among reward functions, presenting a novel algorithm that learns arm similarities and adaptively partitions the context-arm space, achieving regret bounds and demonstrating performance dependent on local geometry in simulations.

Consider a nonparametric contextual multi-arm bandit problem where each arm $a \in [K]$ is associated to a nonparametric reward function $f_a: [0,1] \to \mathbb{R}$ mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes