MLAILGFeb 1, 2023

Delayed Feedback in Kernel Bandits

arXiv:2302.00392v19 citationsh-index: 17
AI Analysis

This addresses a bottleneck in Bayesian optimization for scenarios where feedback is delayed, such as in clinical trials or hyperparameter tuning, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of kernel bandits with stochastically delayed feedback, common in real-world applications like recommendation systems, and proposes an algorithm that achieves a regret bound of $ ilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[ au])$, significantly improving over the previous state-of-the-art bound of $ ilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[ au]\Gamma_k(T))$.

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}(\sqrt{Γ_k(T)T}+\mathbb{E}[τ])$ regret, where $T$ is the number of time steps, $Γ_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $τ$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(Γ_k(T)\sqrt{T}+\mathbb{E}[τ]Γ_k(T))$ reported in Verma et al. (2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes