ML AI LGFeb 1, 2023

Delayed Feedback in Kernel Bandits

Sattar Vakili, Danyal Ahmed, Alberto Bernacchia, Ciara Pike-Burke

arXiv:2302.00392v111.89 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses a bottleneck in Bayesian optimization for scenarios where feedback is delayed, such as in clinical trials or hyperparameter tuning, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of kernel bandits with stochastically delayed feedback, common in real-world applications like recommendation systems, and proposes an algorithm that achieves a regret bound of $ ilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[ au])$, significantly improving over the previous state-of-the-art bound of $ ilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[ au]\Gamma_k(T))$.

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}(\sqrt{Γ_k(T)T}+\mathbb{E}[τ])$ regret, where $T$ is the number of time steps, $Γ_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $τ$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(Γ_k(T)\sqrt{T}+\mathbb{E}[τ]Γ_k(T))$ reported in Verma et al. (2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

View on arXiv PDF

Similar