LG MLOct 21, 2024

Near-Optimal Algorithm for Non-Stationary Kernelized Bandits

arXiv:2410.16052v112.57 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient optimization in time-varying environments for applications like online learning and Bayesian optimization, though it is incremental as it builds on prior algorithms with modifications.

The paper tackles the non-stationary kernelized bandit problem by proposing a near-optimal algorithm called R-PERP that minimizes regret under time-varying reward functions, achieving a regret upper bound that matches the lower bound while bypassing the computational cost of existing methods.

This paper studies a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization, where one seeks to minimize the regret under an unknown reward function that varies over time. In particular, we focus on a near-optimal algorithm whose regret upper bound matches the regret lower bound. For this goal, we show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Matérn kernels, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal. However, this existing algorithm suffers from feasibility issues due to its huge computational cost. Therefore, we propose a novel near-optimal algorithm called restarting phased elimination with random permutation (R-PERP), which bypasses the huge computational cost. A technical key point is the simple permutation procedures of query candidates, which enable us to derive a novel tighter confidence bound tailored to the non-stationary problems.

View on arXiv PDF

Similar