LGMLJun 15, 2023

Finite-Time Logarithmic Bayes Regret Upper Bounds

arXiv:2306.09136v31 citationsh-index: 48
AI Analysis

This addresses a fundamental gap in Bayesian bandit theory by providing tight regret bounds, which is important for researchers and practitioners in sequential decision-making and reinforcement learning.

The paper tackles the problem of deriving finite-time logarithmic regret bounds for Bayesian bandits, achieving O(c log n) and O(c log^2 n) upper bounds that match a known lower bound, significantly improving upon existing O(sqrt(n)) bounds.

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain $O(c_Δ\log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_Δ$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing $\tilde{O}(\sqrt{n})$ bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes