Representative Action Selection for Large Action Space Meta-Bandits
This addresses the computational challenge of large action spaces in bandit problems, which is incremental as it builds on existing Gaussian process models and bandit algorithms.
The paper tackles the problem of efficiently selecting a subset from a large action space in meta-bandits to achieve performance close to using the full space, by proposing an epsilon-net algorithm that exploits similarity in payoffs modeled by a Gaussian process, with empirical comparisons to Thompson Sampling and Upper Confidence Bound.
We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.