LGJan 29

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

Shuai Liu, Alireza Bakhtiari, Alex Ayoub, Botao Hao, Csaba Szepesvári

arXiv:2601.21167v11 citationsh-index: 4

Originality Highly original

AI Analysis

This work provides the first simple regret guarantees for stochastic contextual logistic bandits, which is a significant theoretical advancement for researchers in bandit algorithms.

This paper addresses the problem of simple regret in stochastic contextual logistic bandits, a setting where no prior guarantees existed. The authors propose a deterministic algorithm that achieves a simple regret of \tilde{\mathcal{O}}(d/\sqrt{T}) and a novel Thompson Sampling variant that achieves \tilde{\mathcal{O}}(d^{3/2}/\sqrt{T}) for both linear and logistic cases, notably without dependence on the parameter magnitude constant \kappa in the leading term.

We study stochastic contextual logistic bandits under the simple regret objective. While simple regret guarantees have been established for the linear case, no such results were previously known for the logistic setting. Building on ideas from contextual linear bandits and self-concordant analysis, we propose the first algorithm that achieves simple regret $\tilde{\mathcal{O}}(d/\sqrt{T})$. Notably, the leading term of our regret bound is free of the constant $κ= \mathcal O(\exp(S))$, where $S$ is a bound on the magnitude of the unknown parameter vector. The algorithm is shown to be fully tractable when the action set is finite. We also introduce a new variant of Thompson Sampling tailored to the simple-regret setting. This yields the first simple regret guarantee for randomized algorithms in stochastic contextual linear bandits, with regret $\tilde{\mathcal{O}}(d^{3/2}/\sqrt{T})$. Extending this method to the logistic case, we obtain a similarly structured Thompson Sampling algorithm that achieves the same regret bound -- $\tilde{\mathcal{O}}(d^{3/2}/\sqrt{T})$ -- again with no dependence on $κ$ in the leading term. The randomized algorithms, as expected, are cheaper to run than their deterministic counterparts. Finally, we conducted a series of experiments to empirically validate these theoretical guarantees.

View on arXiv PDF

Similar