ML LGJun 20, 2024

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

arXiv:2406.14071v33.1

Originality Incremental advance

AI Analysis

This work provides theoretical guarantees for practical Bayesian bandit algorithms, which is significant for researchers and practitioners in reinforcement learning and decision-making, though it is incremental as it extends existing analysis to approximate inference settings.

The authors tackled the lack of theoretical justification for Bayesian bandit algorithms with approximate inference in stochastic linear bandits, showing that Linear Thompson Sampling and Linear Bayesian Upper Confidence Bound preserve their original regret upper bound rates but with larger constants, and that LinBUCB achieves a minimax optimal rate of O(d√T).

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct frequentist regret analysis on two Bayesian bandit algorithms, Linear Thompson Sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in approximate inference settings, LinTS and LinBUCB can universally preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $α$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

View on arXiv PDF

Similar