LGJun 11, 2025

Efficient kernelized bandit algorithms via exploration distributions

arXiv:2506.10091v1h-index: 3
Originality Highly original
AI Analysis

This work provides an incremental improvement for researchers and practitioners working on kernelized bandit problems, particularly those interested in efficient exploration strategies.

The authors tackled the kernelized bandit problem and achieved a regret bound of $ ilde{O}(γ_Tsqrt{T})$, matching known results for UCB- and Thompson Sampling-based algorithms. Their proposed algorithm also showed better practical results with randomization.

We consider a kernelized bandit problem with a compact arm set ${X} \subset \mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $\tilde{O}(γ_T\sqrt{T})$ regret bounds, where $γ_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes