ML LG ST MEMay 15, 2025

Batched Nonparametric Bandits via k-Nearest Neighbor UCB

arXiv:2505.10498v27.81 citationsh-index: 1Trans. Mach. Learn. Res.

Originality Highly original

AI Analysis

This addresses the challenge of limited feedback in domains like medicine and marketing, offering a novel nonparametric approach for batched bandits, though it is incremental in improving upon existing methods.

The paper tackles the problem of sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon with limited online feedback, by proposing BaNk-UCB, a nonparametric algorithm that combines k-nearest neighbor regression with UCB, and it achieves near-optimal regret guarantees and outperforms binning-based baselines in empirical evaluations.

We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.

View on arXiv PDF

Similar